Method and system for answer extraction

ABSTRACT

A document searching method including employing a computer to receive, from a user, a query including at least one search term, employing computerized answer retrieving functionality to generate document search terms including at least one additional search term not present in the query, which the at least one additional search term was acquired, prior to receipt by the computer of the query from the user, by the computerized answer retrieving functionality in response to at least one query in the form of a question; and operating computerized search engine functionality to access a set of documents in response to the query, based not only on at least one search term supplied by the user in the query, but also on the at least one additional search term provided by the computerized answer retrieving functionality.

FIELD OF THE INVENTION

The present invention relates to document searching methodologies andsystems generally.

BACKGROUND OF THE INVENTION

The following patent publications are believed to represent the currentstate of the art:

U.S. Pat. Nos. 6,910,003; 6,584,470; 6,601,026; 6,560,590; 6,665,640;6,615,172; 5,574,908; 6,901,399; 6,766,316; 6,758,397; 6,745,161;6,676,014; 6,633,846; 6,616,047 and 6,491,217;

U.S. Patent Application Publication Nos. 2004/0243417; 2004/0111408;2004/0083092; 2003/0182391 and 2002/0002452.

SUMMARY OF THF INVENTION

The present invention seeks to provide improved document searchingmethodologies and systems.

There is thus provided in accordance with a preferred embodiment of thepresent invention a document searching method including employing acomputer to receive, from a user, a query including at least one searchterm, employing computerized answer retrieving functionality to generatedocument search terms including at least one additional search term notpresent in the query, which the at least one additional search term wasacquired, prior to receipt by the computer of the query from the user,by the computerized answer retrieving functionality in response to atleast one query in the form of a question; and operating computerizedsearch engine functionality to access a set of documents in response tothe query, based not only on at least one search term supplied by theuser in the query, but also on the at least one additional search termprovided by the computerized answer retrieving functionality.

There is also provided in accordance with another preferred embodimentof the present invention a system for document searching including acomputer operative to receive, from a user, a query including at leastone search term, computerized answer retrieving functionality operativeto generate document search terms including at least one additionalsearch term not present in the query, which the at least one additionalsearch term was acquired, prior to receipt by the computer of the queryfrom the user, by the computerized answer retrieving functionality inresponse to at least one query in the form of a question andcomputerized search engine functionality operative to access a set ofdocuments in response to the query, based not only on the at least onesearch term but also on the at least one additional search term providedby the computerized answer retrieving functionality.

Preferably, the query is a question. Alternatively, the query is not aquestion.

Preferably, the employing computerized answer retrieving functionalityprovides the at least one additional search term by retrieving searchterms, acquired other than in response to earlier questions, received bythe computerized answer retrieving functionality prior to receipt of thequery from the user.

There is further provided in accordance with yet another preferredembodiment of the present invention an answer extraction methodincluding employing a computer to receive a question from a user,employing a computer network to access a set of documents relevant tothe question by employing document search terms derived by the computerfrom the question, the document search terms including at least oneadditional search term not present in the question, which the at leastone additional search term was acquired prior to receipt of the questionfrom the user, analyzing the set of documents to extract at least oneanswer to the question; and providing the at least one answer to theuser.

Preferably, the employing a computer network includes providing the atleast one additional search term, by retrieving search terms acquired inresponse to earlier questions, received prior to receipt of the questionfrom the user. Alternatively, the employing a computer network includesproviding the at least one additional search term by retrieving searchterms, acquired other than in response to earlier questions, receivedprior to receipt of the question from the user.

In a preferred embodiment of the present invention the employing acomputer includes employing the computer to receive the query orquestion by at least one of typing the query or question, using a voiceresponsive input device, using a screen scraping functionality, using anemail functionality, using an SMS functionality and using an instantmessaging functionality.

Preferably, the employing computerized answer retrieving functionalityto generate document search terms includes utilizing computerized querynormalizing functionality for normalizing the query. Additionally, thenormalizing the query is performed based at least in part on at leastone of a plurality of query normalization rules.

Preferably, the employing computerized answer retrieving functionalityto generate document search terms or the employing document search termsincludes generating document search terms, including the at least oneadditional search term not present in the query or question by replacingat least one word in the query or question by at least one selectedsynonym thereof. Additionally, the replacing at least one word in thequery or question by at least one selected synonym thereof includesemploying computerized synonym retrieving functionality to identify theat least one selected synonym at least partially by reference to atleast one word in the query or question other than the at least one wordwhich is replaced by the at least one selected synonym. Additionally,the employing computerized synonym retrieving functionality includesidentifying the at least one selected synonym by identifying a pluralityof synonyms and selecting at least one of the plurality of synonyms forwhich there exists a phrase in a corpus which is relevant to the queryor question. Additionally, the identifying the at least one selectedsynonym includes searching the corpus for occurrences of at least one ofthe plurality of synonyms for which there exists a phrase in the corpuswhich is relevant to the query or question and designating at least oneof the plurality of synonyms as a selected synonym in accordance with anumber of occurrences in the corpus of a phrase including the at leastone of the plurality of synonyms which is relevant to the query orquestion.

Preferably, the document searching method also includes utilizingcomputerized query processing functionality to process the query priorto the operating computerized search engine functionality, the utilizingcomputerized query processing functionality including utilizing thecomputerized query processing functionality to generate at least oneexpected answer to the query, utilizing the computerized queryprocessing functionality to generate at least one preliminary searchengine query based on the at least one expected answer, utilizing thecomputerized query processing functionality to concatenate the at leastone preliminary search engine query with the at least one additionalsearch term not present in the query, thereby to form a concatenatedsearch engine query and providing the concatenated search engine queryto the computerized search engine functionality.

In accordance with another preferred embodiment the document searchingmethod or the answer extraction method also includes providing arepresentation of at least one document in the set of documents to theuser. Additionally, the providing a representation includes presentingat least one link to the at least one document.

Preferably, the document searching method also includes extracting atleast one answer to the query from at least one document in the set ofdocuments and providing the at least one answer to the user.Additionally, the extracting at least one answer includes analyzing theat least one document by carrying out theme extraction on the at leastone document, the theme extraction utilizing statistical analysis offrequency of occurrence of words to identify at least one theme word ofthe at least one document, extracting sentences from the at least onedocument, selecting at least one of the sentences as a potential answer,scoring each of the at least one of the sentences selected as apotential answer and identifying the at least one of the sentencesselected as a potential answer based at least partially on results ofthe scoring.

Preferably, the analyzing the set of documents to extract at least oneanswer to the question includes carrying out theme extraction on pluralones of the set of documents, the theme extraction utilizing statisticalanalysis of frequency of occurrence of words to identify at least onetheme word of the at least one document, extracting sentences from theat least one document, selecting at least one of the sentences as apotential answer, scoring each of the at least one of the sentences andidentifying at least one of the sentences selected as a potential answerbased at least partially on results of the scoring.

Alternatively or additionally, the extracting at least one answer or theanalyzing the set of documents to extract the at least one answerincludes enhancing the at least one document by identifying capitalizedphrases which appear in the at least one document, identifyingdesignated capitalized words belonging to the capitalized phrases andadding, to the at least one document, adjacent each occurrence of adesignated capitalized word that does not appear in a capitalizedphrase, the designated capitalized word that does appear alongsidethereof elsewhere in the document in a capitalized phrase and carryingout analysis of the at least one document in order to identify at leastone portion thereof as a potential answer. Additionally oralternatively, the providing the at least one answer to the userincludes presenting the at least one answer in an editable reportprecursor format.

Preferably, the employing computerized answer retrieving functionalityincludes employing artificial intelligence.

Preferably, the computerized answer retrieving functionality isoperative to provide the at least one additional search term, byretrieving search terms acquired other than in response to earlierquestions, received by the computerized answer retrieving functionalityprior to receipt of the query from the user.

In a preferred embodiment of the present invention the computer isoperative to receive the query or question from at least one of akeyboard, a voice responsive input device, a screen scrapingfunctionality, an email functionality, an SMS functionality and aninstant messaging functionality.

Preferably, the computerized answer retrieving functionality includescomputerized query normalizing functionality for normalizing the query.Additionally, the computerized query normalizing functionality isoperative to normalize the query based at least in part on at least oneof a plurality of query normalization rules.

Preferably, the computerized answer retrieving functionality or thecomputerized answer extraction functionality is operative to generatethe at least one additional search term not present in the query orquestion by replacing at least one word in the query or question by atleast one selected synonym thereof. Additionally, the computerizedanswer retrieving functionality or the computerized answer extractionfunctionality includes computerized synonym retrieving functionalityoperative to identify the at least one selected synonym at leastpartially by reference to at least one word in the query or questionother than the at least one word which is replaced by the at least oneselected synonym. Additionally, the computerized synonym retrievingfunctionality includes a corpus and the computerized synonym retrievingfunctionality is operative to search the corpus for occurrences of atleast one of a plurality of synonyms for which there exists a phraserelevant to the query or question and to designate at least one of theplurality of synonyms as a selected synonym in accordance with a numberof occurrences in the corpus of a phrase including the at least onesynonym which is relevant to the query or question.

Preferably, the system for document searching or the answer extractionsystem also includes a document output device for providing arepresentation of at least one document in the set of documents to theuser. Additionally, the document output device includes a display forpresenting at least one link to the at least one document.

In accordance with another preferred embodiment the system for documentsearching also includes computerized answer extraction functionality forextracting at least one answer from at least one document in the set ofdocuments and an answer output device for providing the at least oneanswer to the user. Additionally, the computerized answer extractionfunctionality includes a document analyzer operative to analyze the atleast one document, the document analyzer including computerized themeextraction functionality for carrying out theme extraction on the atleast one document, the theme extraction utilizing statistical analysisof frequency of occurrence of words to identify at least one theme wordof the at least one document, computerized sentence extractingfunctionality for extracting sentences from the at least one document, apotential answer selector for selecting at least one of the sentences asa potential answer, computerized scoring functionality for scoring eachof the at least one of the sentences and a sentence identifier foridentifying at least one of the sentences selected as a potential answerbased at least partially on results of the scoring. Alternatively oradditionally, the answer output device includes a display for presentingthe at least one answer to the user in an editable report precursorformat.

Preferably, the computerized answer retrieving functionality includesartificial intelligence.

Preferably, the employing a computer network employs artificialintelligence.

Preferably, the employing document search terms includes utilizingcomputerized question normalizing functionality for normalizing thequestion. Additionally, the normalizing the question is performed basedat least in part on at least one of a plurality of questionnormalization rules.

Preferably, the answer extraction method also includes utilizingcomputerized question processing functionality to process the question,the utilizing computerized question processing functionality includingutilizing the computerized question processing functionality to generateat least one expected answer to the question, utilizing the computerizedquestion processing functionality to generate at least one preliminarysearch engine query based on the at least one expected answer, utilizingthe computerized question processing functionality to concatenate the atleast one preliminary search engine query with the at least oneadditional search term not present in the question, thereby to form aconcatenated search engine query and deriving the document search termsfrom the concatenated search engine query.

Preferably, the providing the at least one answer to the user alsoincludes providing a representation of at least one document of the setof documents to the user. Additionally, the providing a representationincludes presenting at least one link to the at least one document.

In another preferred embodiment of the present invention the question isnot phrased in question format.

There is even further provided in accordance with still anotherpreferred embodiment of the present invention an answer extractionsystem including a computer operative to receive a question from a user,computerized answer extraction functionality operative to employ acomputer network to access a set of documents relevant to the questionby employing document search terms derived by the computer from thequestion, the document search terms including at least one additionalsearch term not present in the question, which the at least oneadditional search term was acquired prior to receipt of the questionfrom the user, computerized answer analysis functionality for analyzingthe set of documents to extract at least one answer to the question andan output device operative to provide the at least one answer to theuser.

Preferably, the computer network provides the at least one additionalsearch term by retrieving search terms, acquired in response to earlierquestions, received prior to receipt of the question from the user.Alternatively, the computer network provides the at least one additionalsearch term by retrieving search terms, acquired other than in responseto earlier questions, received prior to receipt of the question from theuser. Additionally or alternatively, the computer network employsartificial intelligence.

Preferably, the computerized answer extraction functionality includescomputerized question normalizing functionality for normalizing thequestion. Additionally, the computerized question normalizingfunctionality is operative to normalize the question based at least inpart on at least one of a plurality of question normalization rules.

Preferably, the output device is operative to provide a representationof at least one document of the set of documents to the user.Additionally, the output device includes a display for presenting atleast one link to the at least one document to the user.

Preferably, the computerized answer extraction functionality includescomputerized theme extraction functionality for carrying out themeextraction on plural ones of the set of documents, the theme extractionutilizing statistical analysis of frequency of occurrence of words toidentify at least one theme word of the at least one document,computerized sentence extracting functionality for extracting sentencesfrom the at least one document, a potential answer selector forselecting at least one of the sentences as a potential answer, scoringfunctionality for scoring each the at least one of the sentences and asentence identifier for identifying at least one of the sentencesselected as a potential answer based at least partially on results ofthe scoring.

There is also provided in accordance with another preferred embodimentof the present invention an answer extraction method including employinga computer to receive a question from a user, employing a computernetwork to access a set of documents relevant to the question byemploying document search terms derived by the computer from thequestion, extracting at least one answer to the question and providingthe at least one answer to the user, the extracting at least one answerincluding generating an expected answer to the question, the expectedanswer including question keywords, analyzing the set of documents bycarrying out theme extraction on plural ones of the set of documents,the theme extraction utilizing statistical analysis of the frequency ofoccurrence of words to identify at least one theme word of a document,which theme word may or may not be a question keyword and extractingsentences from plural ones of the set of documents, selecting at leastone of the sentences as a potential answer if it fulfills at least oneof the following criteria: a sentence including at least a predeterminedplurality of question keywords and a sentence including at least onequestion keyword and at least one theme word, scoring each of the atleast one of the sentences selected as a potential answer andidentifying at least one of the at least one of the sentences selectedas a potential answer based at least partially on results of thescoring.

Preferably, the answer extraction method also includes, prior to theemploying a computer network to access a set of documents, utilizingcomputerized question normalization functionality for normalizing thequestion and thereafter, utilizing computerized question classificationfunctionality to classify the question.

Preferably, the employing a computer network includes employing thecomputer to derive the document search terms, including at least oneadditional search term not present in the question, which the at leastone additional search term was acquired prior to receipt of the questionfrom the user. Alternatively, the employing a computer network includesemploying the computer to derive the document search terms, including atleast one additional search term not present in the question, byreplacing at least one word in the question by at least one selectedsynonym thereof.

Preferably, the statistical analysis includes for each word in thedocument, stemming the word to a corresponding root word, generating aword occurrence frequency score for each different root wordcorresponding to a word in the document, using the word occurrencefrequency scores to calculate a document word occurrence frequencyindicating score for the document, selecting a subset of words in thedocument including at least one word having a word occurrence frequencyscore which is greater than or equal to the document word occurrencefrequency indicating score. Additionally, the document word occurrencefrequency indicating score includes at least one of an average of theword occurrence frequency scores and a median of the word occurrencefrequency scores. Additionally or alternatively, the statisticalanalysis, the extracting a theme or the identifying at least one themeword includes selecting, as the at least one theme word, at least oneword having a word occurrence frequency score which is greater than orequal to twice the document word occurrence frequency indicating score.

Preferably, the statistical analysis also includes following theselecting a subset of words in the document or the potential answerdocument, calculating a subset word occurrence frequency indicatingscore and selecting, as the at least one theme word, at least one of thesubset of words having a word occurrence frequency score which isgreater than or equal to the subset word occurrence frequency indicatingscore. Additionally, the subset word occurrence frequency indicatingscore includes at least one of an average of the word occurrencefrequency scores of words in the subset of words and a median of theword occurrence frequency scores of words in the subset of words.

There is further provided in accordance with still another preferredembodiment of the present invention an answer extraction systemincluding a computer operative to receive a question from a user andcomputerized answer extraction functionality operative to employ acomputer network to access a set of documents relevant to the questionby employing document search terms derived by the computer from thequestion, to extract at least one answer to the question and to providethe at least one answer to the user, the computerized answer extractionfunctionality including an expected answer generator operative togenerate an expected answer to the question, the expected answerincluding question keywords, a document analyzer operative to carry outtheme extraction on plural ones of the set of documents, the themeextraction utilizing statistical analysis of the frequency of occurrenceof words in a document to identify at least one theme word of thedocument, which theme word may or may not be a question keyword, asentence extractor, operative to extract sentences from plural ones ofthe set of documents, a potential answer selector, operative to selectat least one of the sentences as a potential answer if it fulfills atleast one of the following criteria: a sentence including at least apredetermined plurality of question keywords and a sentence including atleast one question keyword and at least one theme word and a potentialanswer identifier, operative to calculate a score for each of the atleast one of the sentences selected as a potential answer and toidentify at least one of the sentences selected as a potential answerbased at least partially on the score.

Preferably, the answer extraction system also includes computerizedquestion normalizing functionality operative to normalize the questionand computerized question classification functionality for classifyingthe question.

Preferably, the computerized answer extraction functionality isoperative to employ the computer to derive the document search terms,including at least one additional search term not present in thequestion, which the at least one additional search term was acquiredprior to receipt of the question from the user. Alternatively, thecomputerized answer extraction functionality is operative to employ thecomputer to derive the document search terms, including at least oneadditional search term not present in the question, by replacing atleast one word in the question by at least one selected synonym thereof.

Preferably, the answer extraction system also includes an answer outputdevice for providing the at least one answer to the user.

Preferably, the document analyzer or the computerized theme wordidentifying functionality includes computerized word stemmingfunctionality, operative, for each word in the document, to stem theword to a corresponding root word, a word occurrence frequency scoregenerator for generating a word occurrence frequency score for eachdifferent root word corresponding to a word in the document,computerized document word occurrence frequency indicating scorecalculating functionality operative to use the word occurrence frequencyscores to calculate a document word occurrence frequency indicatingscore for the document and computerized word selecting functionalityoperative to select a subset of words in the document including at leastone word having a word occurrence frequency score which is greater thanor equal to the document word occurrence frequency indicating score.Additionally, the computerized document word occurrence frequencyindicating score calculating functionality is operative to calculate thedocument word occurrence frequency indicating score by calculating atleast one of an average of the word occurrence frequency scores and amedian of the word occurrence frequency scores.

Additionally or alternatively, the computerized word selectingfunctionality, the computerized theme extraction functionality or thecomputerized theme word identifying functionality is operative toselect, as the at least one theme word, at least one word having a wordoccurrence frequency score which is greater than or equal to twice thedocument word occurrence frequency indicating score.

Preferably, the document analyzer, the answer extraction system or thecomputerized question generation system also includes computerizedsubset word occurrence frequency indicating score calculatingfunctionality, operative to calculate a subset word occurrence frequencyindicating score and computerized theme word selection functionalityoperative to select, as the at least one theme word, at least one of thesubset of words having a word occurrence frequency score which isgreater than or equal to the subset word occurrence frequency indicatingscore. Additionally, the computerized subset word occurrence frequencyindicating score calculating functionality is operative to calculate thesubset word occurrence frequency indicating score by calculating atleast one of an average of the word occurrence frequency scores of wordsin the subset of words and a median of the word occurrence frequencyscores of words in the subset of words.

There is yet further provided in accordance with yet another preferredembodiment of the present invention an answer extraction methodincluding employing a computer to receive a question from a user,employing a computer network to access a set of documents relevant tothe question by employing document search terms derived by the computerfrom the question, extracting at least one answer to the question andproviding the at least one answer to the user, the extracting at leastone answer including enhancing at least one of the set of documents byidentifying capitalized phrases which appear in the at least onedocument, identifying designated capitalized words belonging to thecapitalized phrases and adding, to the at least one document adjacenteach occurrence of a designated capitalized word that does not appear ina capitalized phrase, the designated capitalized word that does appearalongside thereof elsewhere in the document in a capitalized phrase andcarrying out analysis of the at least one document in order to identifyat least one portion thereof as a potential answer.

Preferably, the extracting at least one answer also includes, prior tothe enhancing, generating an expected answer to the question, theexpected answer including question keywords, and wherein the carryingout analysis of the at least one document includes carrying out themeextraction on the at least one document, the theme extraction utilizingstatistical analysis of the frequency of occurrence of words to identifyat least one theme word of the at least one document, which theme wordmay or may not be a question keyword, extracting sentences from the atleast one document, selecting at least one of the sentences as apotential answer if it fulfills at least one of the following criteria:a sentence including at least a predetermined plurality of questionkeywords and a sentence including at least one question keyword and atleast one theme word, scoring each of the at least one of the sentencesselected as a potential answer and identifying at least one of thesentences selected as a potential answer based at least partially onresults of the scoring.

Preferably, the statistical analysis includes for each word in the atleast one document, stemming the word to a corresponding root word,generating a word occurrence frequency score for each different rootword corresponding to a word in the at least one document, using theword occurrence frequency scores to calculate a document word occurrencefrequency indicating score for the at least one document and selectingas potential theme words a subset of words in the at least one documentincluding at least one word having a word occurrence frequency scorewhich is greater than or equal to the document word occurrence frequencyindicating score.

Preferably, the selecting as potential theme words includes selecting,as the at least one theme word, at least one word having a wordoccurrence frequency score which greater than or equal to twice thedocument word occurrence frequency indicating score. Additionally, thestatistical analysis also includes, following the selecting as potentialtheme words a subset of words in the at least one document, calculatinga subset word occurrence frequency indicating score and selecting, asthe at least one theme word, at least one of the subset of words havinga word occurrence frequency score which is greater than or equal to thesubset word occurrence frequency indicating score.

There is even further provided in accordance with another preferredembodiment of the present invention an answer extraction systemincluding a computer operative to receive a question from a user,computerized answer extraction functionality operative to employ acomputer network to access a set of documents relevant to the questionby employing document search terms derived by the computer from thequestion, to extract at least one answer to the question and to providethe at least one answer to the user, the computerized answer extractionfunctionality including a document analyzer operative to identifycapitalized phrases which appear in a document belonging to the set ofdocuments, to identify designated capitalized words belonging to thecapitalized phrases, to add to the document adjacent each occurrence ofa designated capitalized word that does not appear in a capitalizedphrase, the designated capitalized word that does appear alongsidethereof elsewhere in the document in a capitalized phrase, therebyproviding an enhanced document, and to carry out analysis of theenhanced document in order to identify at least one portion thereof as apotential answer.

Preferably, the computerized answer extraction functionality alsoincludes an expected answer generator operative to generate an expectedanswer to the question, the expected answer including question keywords,and wherein the document analyzer or the computerized document analysisfunctionality includes computerized theme extraction functionality forcarrying out theme extraction on the document or the enhanced document,the theme extraction utilizing statistical analysis of the frequency ofoccurrence of words to identify at least one theme word of the documentor enhanced document, which theme word may or may not be a questionkeyword, a sentence extractor, operative to extract sentences from thedocument or enhanced document, a potential answer selector, operative toselect at least one of the sentences as a potential answer if itfulfills at least one of the following criteria: a sentence including atleast a predetermined plurality of question keywords and a sentenceincluding at least one question keyword and at least one theme word anda potential answer identifier, operative to calculate a score for eachof the at least one of the sentences and to identify at least one of thesentences selected as a potential answer based at least partially onresults of the score.

There is yet further provided in accordance with another preferredembodiment of the present invention an answer extraction methodincluding employing a computer to receive a question from a user,employing a computer network to access a set of documents relevant tothe question by employing document search terms derived by the computerfrom the question, extracting at least one answer to the question andproviding the at least one answer to the user, the extracting at leastone answer to the question including identifying a multiplicity ofpotential answers and evaluating each of the multiplicity of potentialanswers according to at least one of the following criteria: proximityof question keywords in the potential answer, proximity ofclassification words and nouns in the potential answer and word count ofat least part of the potential answer.

Alternatively, the evaluating includes evaluating each of themultiplicity of potential answers according to at least two of thefollowing criteria, all of the following criteria or a combination ofthe following criteria: proximity of question keywords in the potentialanswer, proximity of classification words and nouns in the potentialanswer and word count of at least part of the potential answer.

Additionally or alternatively, the extracting at least one answer alsoincludes selecting a sub group of the multiplicity of potential answersbased on an evaluation of the multiplicity of potential answers inaccordance with the criteria. Additionally, the evaluation includesscoring the multiplicity of potential answers in accordance with thecriteria.

Preferably, the answer extraction method also includes forming apotential answer document by combining the multiplicity of potentialanswers, extracting a theme of the sub group of the multiplicity ofpotential answers, by utilizing statistical analysis of the frequency ofoccurrence of words in the potential answer document to identify atleast one theme word in the sub group of the multiplicity of potentialanswers, which theme word may or may not be a question keyword anddiscarding potential answers belonging to the sub group of themultiplicity of potential answers which do not include at least one ofthe at least one theme word.

Preferably, the statistical analysis includes for each word in thepotential answer document, stemming the word to a corresponding rootword, generating a word occurrence frequency score for each differentroot word corresponding to a word in the potential answer document,using the word occurrence frequency scores to calculate a document wordoccurrence frequency indicating score for the potential answer documentand selecting a subset of words in the potential answer documentincluding at least one word having a word occurrence frequency scorewhich is greater than or equal to the document word occurrence frequencyindicating score.

Preferably, the providing the at least one answer to the user includesproviding the at least one answer to the user in an order governed atleast in part by at least one of a word count of each of the at leastone answer, a score resulting from application to each of the at leastone answer of at least one of the following criteria: proximity ofquestion keywords in the at least one answer, proximity ofclassification words and nouns in the at least one answer and word countof at least part of the at least one answer.

Preferably, the identifying a multiplicity of potential answers alsoincludes enhancing at least one of the set of documents by identifyingcapitalized phrases which appear in the at least one of the set ofdocuments, identifying designated capitalized words belonging to thecapitalized phrases and adding, to the at least one of the set ofdocuments adjacent each occurrence of a designated capitalized word thatdoes not appear in a capitalized phrase, the designated capitalized wordthat does appear alongside thereof elsewhere in the document in acapitalized phrase and carrying out analysis of the at least one of theset of documents in order to identify at least one portion thereof as apotential answer. Additionally, the identifying a multiplicity ofpotential answers also includes, prior to the enhancing, generating anexpected answer to the question, the expected answer including questionkeywords, and wherein the carrying out analysis includes carrying outtheme extraction on the at least one of the set of documents, the themeextraction utilizing statistical analysis of the frequency of occurrenceof words to identify at least one theme word of the at least one of theset of documents, which theme word may or may not be a question keyword,extracting sentences from the at least one of the set of documents,selecting at least one of the sentences as a potential answer if itfulfills at least one of the following criteria: a sentence including atleast a predetermined plurality of question keywords and a sentenceincluding at least one question keyword and at least one theme word,scoring each of the at least one of the sentences selected as apotential answer and identifying at least one of the sentences selectedas a potential answer based at least partially on results of thescoring.

There is also provided in accordance with still another preferredembodiment of the present invention an answer extraction systemincluding a computer operative to receive a question from a user,computerized answer extraction functionality operative to employ acomputer network to access a set of documents relevant to the questionby employing document search terms derived by the computer from thequestion, to extract at least one answer to the question and to providethe at least one answer to the user, the computerized answer extractionfunctionality being operative to identify a multiplicity of potentialanswers and to evaluate each of the multiplicity of potential answersaccording to at least one of the following criteria: proximity ofquestion keywords in the potential answer, proximity of classificationwords and nouns in the potential answer and word count of at least partof the potential answer.

Alternatively, the computerized answer extraction functionality isoperative to evaluate each of the multiplicity of potential answersaccording to at least two of the following criteria, all of thefollowing criteria or a combination of the following criteria: proximityof question keywords in the potential answer, proximity ofclassification words and nouns in the potential answer and word count ofat least part of the potential answer. Additionally, the computerizedanswer extraction functionality is operative to select a sub group ofthe multiplicity of potential answers based on an evaluation of themultiplicity of potential answers in accordance with the criteria.

Preferably, the evaluation includes scoring the multiplicity ofpotential answers in accordance with the criteria. Additionally, theanswer extraction system also includes computerized potential answercombining functionality operative to form a potential answer document bycombining the multiplicity of potential answers, computerized themeextraction functionality for carrying out theme extraction on the subgroup of the multiplicity of potential answers, the theme extractionutilizing statistical analysis of the frequency of occurrence of wordsin the potential answer document to identify at least one theme word inthe sub group of the multiplicity of potential answers, which theme wordmay or may not be a question keyword and computerized potential answerdiscarding functionality operative to discard potential answersbelonging to the sub group of the multiplicity of potential answerswhich do not include at least one of the at least one theme word.

Preferably, the computerized theme extraction functionality includescomputerized word stemming functionality, operative, for each word inthe potential answers document, to stem the word to a corresponding rootword, a word occurrence frequency score generator for generating a wordoccurrence frequency score for each different root word corresponding toa word in the potential answers document, computerized document wordoccurrence frequency indicating score calculating functionalityoperative to use the word occurrence frequency scores to calculate adocument word occurrence frequency indicating score for the potentialanswers document and computerized word selecting functionality operativeto select a subset of words in the potential answers document includingat least one word having a word occurrence frequency score which isgreater than or equal to the document word occurrence frequencyindicating score.

Preferably, the computerized answer extraction functionality providesthe at least one answer to the user in an order governed at least inpart by at least one of a word count of each one of the at least oneanswer and a score, resulting from application to each one of the atleast one answer of at least one of the following criteria: proximity ofquestion keywords in the at least one answer, proximity ofclassification words and nouns in the at least one answer and word countof at least part of the at least one answer.

Preferably, the computerized answer extraction functionality includescomputerized document analysis functionality operative to identifycapitalized phrases which appear in at least one of the set ofdocuments, to identify designated capitalized words belonging to thecapitalized phrases and to add to the at least one of the set ofdocuments, adjacent each occurrence of a designated capitalized wordthat does not appear in a capitalized phrase, the designated capitalizedword that does appear alongside thereof elsewhere in the at least one ofthe set of documents in a capitalized phrase, thereby providing anenhanced document, and to carry out analysis of the enhanced document inorder to identify at least one portion thereof as a potential answer.

There is further provided in accordance with yet another preferredembodiment of the present invention a document searching methodincluding employing a computer to receive a query including at least onesearch term from a user and employing computerized synonym retrievingfunctionality operative in response to queries to generate documentsearch terms including at least one additional search term not presentin the query, the computerized synonym retrieving functionality beingoperative to generate the at least one additional search term byreplacing at least one word in the query by at least one selectedsynonym thereof and operating computerized search engine functionalityto access a set of documents in response to the query, based on at leastone of the at least one search term supplied by a user and the at leastone additional search term provided by the computerized synonymretrieving functionality, the computerized synonym retrievingfunctionality being operative to identify the at least one selectedsynonym at least partially by reference to at least one word in thequery other than the at least one word.

Preferably, the computerized synonym retrieving functionality isoperative to identify the at least one selected synonym by identifying aplurality of synonyms and selecting at least one of the plurality ofsynonyms for which there exists a phrase relevant to the query in acorpus. Additionally, the computerized synonym retrieving functionalityor the synonym selector is operative to identify the selected synonym bysearching the corpus for occurrences of the at least one of theplurality of synonyms for which there exists a phrase relevant to thequery and designating at least one of the plurality of synonyms as aselected synonym in accordance with the number of occurrences in thecorpus of a phrase including the at least one of the plurality ofsynonyms which is relevant to the query.

Preferably, the at least one word in the query which is replaced by theat least one selected synonym thereof includes at least one of a noun, averb, an object of a verb and a subject of a verb.

There is still further provided in accordance with yet another preferredembodiment of the present invention a document searching systemincluding a computer operative to receive a query including at least onesearch term from a user, computerized synonym retrieving functionalityoperative, in response to queries, to generate document search terms,including at least one additional search term not present in the queryand to generate the at least one additional search term by replacing atleast one word in the query by at least one selected synonym thereof andcomputerized search engine functionality operative to access a set ofdocuments in response to the query, based on at least one of the atleast one search term supplied by a user and the at least one additionalsearch term provided by the computerized synonym retrievingfunctionality, the computerized synonym retrieving functionality beingoperative to identify the selected synonym at least partially byreference to a word in the query other than the at least one word.

Preferably, the computerized synonym retrieving functionality includes asynonym selector operative to identify a plurality of synonyms and toselect at least one of the plurality of synonyms for which there existsa phrase relevant to the query in a corpus.

There is even further provided in accordance with still anotherpreferred embodiment of the present invention a computerized synonymgenerating method including receiving a stream of words, employing acomputer for generating a list of synonyms for at least one word in thestream of words, employing a computer for searching a corpus forsynonym-containing phrases including at least one synonym in the list ofsynonyms together with at least part of the stream of words, employing acomputer for evaluating the frequency of occurrence of each of thesynonym-containing phrases and proposing at least one selected synonymwhich forms part of a synonym-containing phrase having a relatively highfrequency of occurrence in the corpus.

Preferably, the computerized synonym generating method also includesemploying a computer for searching the corpus for received phrasesincluding the at least one word together with the at least part of thestream of words, employing a computer for comparing the frequency ofoccurrence of the received phrases in the corpus with the frequency ofoccurrence of the synonym-containing phrases and proposing at least oneselected synonym which forms part of a synonym-containing phrase only ifthe frequency of occurrence of the synonym-containing phrase exceeds thefrequency of occurrence of the received phrase. Additionally, the atleast one word includes at least one of a noun, a verb, an object of averb and a subject of a verb.

There is also provided in accordance with another preferred embodimentof the present invention a computerized synonym generating systemincluding a computer operative to generate a list of synonyms for atleast one word in a stream of words received from a user, computerizedsearching functionality operative to search a corpus forsynonym-containing phrases including at least one synonym in the list ofsynonyms together with at least part of the stream of words,computerized frequency evaluation functionality operative to evaluatethe frequency of occurrence of each of the synonym-containing phrasesand computerized synonym providing functionality operative to propose atleast one selected synonym which forms part of a synonym-containingphrase having a relatively high frequency of occurrence in the corpus.

Preferably, the computerized synonym generating system also includescomputerized received phrases searching functionality operative tosearch the corpus for received phrases including the at least one wordtogether with the at least part of the stream of words and computerizedoccurrence frequency comparing functionality operative to compare thefrequency of occurrence of the received phrases in the corpus with thefrequency of occurrence of the synonym-containing phrases, thecomputerized synonym providing functionality being operative to proposeat least one selected synonym which forms part of a synonym-containingphrase only if the frequency of occurrence of the synonym-containingphrase exceeds the frequency of occurrence of the received phrase.

There is further provided in accordance with still another preferredembodiment of the present invention a computerized question generationmethod including identifying at least one theme word in a document,searching for previously asked questions containing the at least onetheme word or having previously generated answers containing the atleast one theme word and presenting the previously asked questions.

Preferably, the computerized question generation method also includes,prior to the identifying, employing a computer to obtain the documentfrom a user, and the presenting includes presenting the previously askedquestions on the computer to the user. Additionally or alternatively,the identifying includes carrying out statistical analysis of thefrequency of occurrence of words in the document.

Preferably, the carrying out statistical analysis includes for each wordin the document, stemming the word to a corresponding root word,generating a word occurrence frequency score for each different rootword corresponding to a word in the document, using the word occurrencefrequency scores to calculate a document word occurrence frequencyindicating score for the document and selecting a subset of words in thedocument including at least one word having a word occurrence frequencyscore which is greater than or equal to at least the document wordoccurrence frequency indicating score.

There is yet further provided in accordance with yet another preferredembodiment of the present invention a computerized question generationsystem including computerized theme word identifying functionality foridentifying at least one theme word in a document, computerized previousanswer searching functionality operative to search for previously askedquestions containing the at least one theme word or having previouslygenerated answers containing the at least one theme word, and an outputdevice for providing the previously asked questions.

Preferably, the computerized theme word identifying functionality isoperative to carry out statistical analysis of the frequency ofoccurrence of words in the document.

There is also provided in accordance with another preferred embodimentof the present invention a computerized editable report precursorgenerating method including inputting at least one question into acomputer, employing the computer to obtain at least one answer to the atleast one question, storing the at least one answer to the at least onequestion, presenting the at least one question to the at least oneanswer in an editable form on the computer as an editable reportprecursor, archiving a multiplicity of the editable report precursorsand following the archiving, employing the multiplicity of editablereport precursors to enhance the employing the computer.

Preferably, the archiving includes archiving edited versions of themultiplicity of editable report precursors and the edited versions arealso employed to enhance the employing the computer. Additionally, theinputting includes inputting the at least one question to the computerby at least one of typing the question, using a voice responsive inputdevice, using a screen scraping functionality, using an emailfunctionality, using an SMS functionality and using an instant messagingfunctionality.

Preferably, the employing the computer includes employing computerizedanswer retrieving functionality to generate document search termsincluding at least one additional search term not present in thequestion, which the additional search term was acquired, prior toreceipt by the computer of the question from the user, by thecomputerized answer retrieving functionality in response to the at leastone question and operating computerized search engine functionality toaccess a set of documents in response to the question, based not only onat least one search term supplied by a user but also on the at least oneadditional search term provided by the at least one computerized answerretrieving functionality.

There is yet further provided in accordance with still another preferredembodiment of the present invention a computerized editable reportprecursor generating method including inputting at least one desiredreport subject identifier into a computer, employing the computer togenerate at least one question related to a desired subject identifiedby the at least one desired report subject identifier, employing thecomputer to obtain at least one answer to the at least one question andpresenting the at least one question to the at least one answer in aneditable form on the computer, thereby providing an editable reportprecursor.

Preferably, the computerized editable report precursor generating methodalso includes archiving a multiplicity of the editable report precursorsand following the archiving, employing the multiplicity of editablereport precursors to enhance at least one of the employing the computerto generate at least one question and the employing the computer toobtain at least one answer to the at least one question. Additionally oralternatively, the archiving includes archiving edited versions of themultiplicity of editable report precursors and wherein the editedversions are also employed to enhance at least one of the employing thecomputer to generate at least one question and the employing thecomputer to obtain at least one answer to the at least one question.

Preferably, the inputting includes inputting the at least desired reportsubject identifier to the computer by at least one of typing the desiredreport subject identifier, using a voice responsive input device, usinga screen scraping functionality, using an email functionality, using anSMS functionality and using an instant messaging functionality.

Preferably, the employing the computer to generate the at least onequestion includes employing the desirable report subject identifier tosearch for previously asked questions containing at least part of thedesirable report subject identifier or having previously generatedanswers containing at least part of the desirable report subjectidentifier.

Preferably, the employing the computer includes employing computerizedanswer retrieving functionality to generate document search termsincluding at least one additional search term not present in thequestion, which the additional search term was acquired, prior toreceipt by the computer of the desired report subject identifier fromthe user, by the computerized answer retrieving functionality inresponse to at least one query, operating computerized search enginefunctionality to access a set of documents in response to the question,based not only on the desired report subject identifier but also on theat least one additional search term provided by the at least onecomputerized answer retrieving functionality.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be understood and appreciated more fully fromthe following detailed description, taken in conjunction with thedrawings in which:

FIG. 1 is a simplified illustration of document searching functionalityoperative in accordance with a preferred embodiment of the presentinvention;

FIG. 2 is a simplified flow chart of the document searchingfunctionality of FIG. 1;

FIG. 3 is a simplified flow chart of answer extraction methodology whichforms part of the document searching functionality of FIGS. 1 & 2;

FIG. 4 is a simplified illustration of a question generatingfunctionality operative in accordance with another preferred embodimentof the present invention;

FIG. 5 is a simplified flow chart of the question generatingfunctionality of FIG. 4; and

FIG. 6 is a simplified illustration of report precursor-generatingfunctionality operative in accordance with yet another preferredembodiment of the present invention.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

Throughout the specification and claims, certain defined terms havespecific meanings as set forth hereinbelow:

Stopwords are defined as very common words which are useless insearching or indexing documents. Stopwords generally include articles,adverbials and adpositions. Some obvious stopwords are “a”, “of”, “the”,“I”, “it”, “you”, and “and”.

Keywords are defined as all the words in a sentence or phrase, such asin a question or other query, that are not stopwords. Keywords generallyinclude all the nouns in a sentence or phrase, as well as verbs andadjectives.

Question Keywords and Query Keywords are Keywords that appear in aquestion or query.

Phrases are defined as a collection of words.

Throughout, phrases, indicated by inclusion in quotation marks “ ”, areprocessed by a computerized methodology as complete phrases. Othercollections of words, such as those joined by symbols such as + and &are processed by the computerized methodology as separate termsconnected by Boolean operators.

Reference is now made to FIG. 1, which is a simplified illustration of atypical document searching methodology operative in accordance with apreferred embodiment of the present invention. As seen in FIG. 1, a useroperating a client computer 100, employs a conventional web browser suchas Microsoft® Internet Explorer® to access a web page 102 containing asearch input box 104. The user enters a query, preferably a questionsuch as “HOW COME MARS IS RED?”, in the search input box 104.

Alternatively, any other suitable methodology may be employed forentering the query, such as the use of a voice responsive input device,a screen scraping functionality, an email functionality, an SMSfunctionality or an instant messaging functionality.

The question is supplied, typically via the Internet, to a queryprocessing server 110, which normalizes the question, as describedhereinbelow in greater detail, and provides a normalized questionoutput, such as “WHY IS MARS RED?”.

In accordance with a preferred embodiment of the present invention, aspart of the normalizing functionality, server 110 is operative inresponse to generate document search terms including at least oneadditional search term not present in a query by replacing at least oneword in the query by at least one selected synonym thereof.

In accordance with a preferred embodiment of the present invention, thenormalized question output is supplied to a previous answer retrievalserver 112, which provides an output of keywords previously given inanswers to the same question or a similar question. However, it ispossible that such keywords will not be found. It is appreciated thatthe functionality of server 112 may be carried out by server 110, thusobviating server 112.

The output of server 112 may typically be a string of words or phrasessuch as IRON OXIDE, RUST and IRON.

Server 110 generates at least one expected answer to the question and onthe basis of the expected answer generates a plurality of preliminarysearch engine queries, such as “MARS IS RED BECAUSE OF”, “MARS IS REDBECAUSE”, MARS+RED+BECAUSE AND MARS+RED.

In accordance with a preferred embodiment of the present invention,server 110 concatenates the preliminary search engine queries with theoutputs of server 112, thus providing a plurality of concatenated searchengine queries, typically:

“MARS IS RED BECAUSE OF”+“IRON OXIDE”+RUST+IRON; “MARS IS REDBECAUSE”+“IRON OXIDE”+RUST+IRON; MARS+RED+BECAUSE+“IRONOXIDE”+RUST+IRON; and MARS+RED+“IRON OXIDE”+RUST+IRON.

Server 110 communicates via the Internet with a conventional searchengine server 120, such as an Answers.com™, GOOGLE® or YAHOO® server,which performs a web search in accordance with the concatenated searchengine queries. The search engine server typically provides searchresults to server 110 in the form of links to relevant documents, suchas the following links:

http://solarsystem.nasa.gov/planets/profile.cfm?Object=Mars&Display=Kidshttp://schools.mukliteo.wednet.edu/me/staff/bullocksk/FQA/why_is_red.htm

It is appreciated that the functionality of search engine server 120 maybe carried out by using a local search engine index located on server110, thus obviating server 120.

Server 110 retrieves the documents identified by the links received fromthe search engine server 120. In accordance with a preferred embodimentof the present invention, server 110 carries out answer extractionincluding, inter alia the following functionality:

Extracting at least one answer to a question by generating an expectedanswer to the question, where the expected answer includes questionkeywords; analyzing the documents identified by the search engine bycarrying out theme extraction on plural ones of the set of documents;and extracting sentences from plural ones of the set of documents. Thetheme extraction utilizes statistical analysis of the frequency ofoccurrence of words to identify at least one theme word of a document,which may or may not be a question keyword.

Selecting at least one of the sentences as a potential answer if itfulfills at least one of the following criteria: a sentence including atleast a predetermined plurality of question keywords and a sentenceincluding at least one question keyword and at least one theme word.

Scoring each sentence selected as a potential answer; and

Identifying at least one of the sentences selected as a potential answerbased at least partially on results of the scorings.

Additionally or alternatively, in accordance with a preferred embodimentof the present invention, server 110 carries out answer extractionincluding, inter alia the following functionality:

Extracting at least one answer to the question by analyzing the set ofdocuments. The set of documents is analyzed by enhancing each documentin the set by identifying capitalized phrases which appear in thedocument, identifying designated capitalized words belonging to thecapitalized phrases and adding to the document adjacent each designatedcapitalized word that does not appear in a capitalized phrase, thedesignated capitalized word that does appear alongside thereof elsewherein the document in a capitalized phrase; and

Carrying out analysis of the enhanced document in order to identify atleast one portion thereof as a potential answer.

Additionally or alternatively, in accordance with a preferred embodimentof the present invention, server 110 carries out potential answerranking among multiple potential answers, including, inter alia,identifying a multiplicity of potential answers and evaluating each of amultiplicity of potential answers according to at least one of thefollowing criteria:

proximity of question keywords in the potential answer;

proximity of classification words and nouns in the potential answer; and

word count of at least part of the potential answer.

Server 110 preferably provides multiple “best” answers to the user viathe Internet and the user's computer 100. Typical “best” answers are:

THE SOIL ON MARS IS RED BECAUSE IT CONTAINS IRON OXIDE MARS IS REDBECAUSE OF ALL OF THE IRON AND OXIDE THAT IS CALLED RUST.

The “best” answers may be combined and presented to the user in anysuitable format, such as in an editable report precursor format 130.Such a format allows the user to manipulate, annotate and edit multipleanswers so as to create a report based thereon. If desired, “best”answers to multiple questions may be combined in a single editablereport precursor format.

It is appreciated that the computerized document searching functionalitydescribed hereinabove with reference to FIG. 1 utilizes artificialintelligence.

Reference is now made to FIG. 2, which is a simplified flow chart of thedocument searching methodology of FIG. 1. As seen in FIG. 2, a user'sinput question is typically received from client computer 100 (FIG. 1)which employs a conventional web browser such as Microsoft® InternetExplorer®.

It is appreciated that an input question is one example of an inputquery, which need not necessarily be a question. Examples of inputqueries which are not questions are: “CAPITAL OF OHIO”, “ABRAHAMLINCOLN'S SECRETARY OF STATE” and “MAXIMUM DEPTH OF THE PACIFIC OCEAN”.For the sake of simplicity and conciseness, most of the description ofthe present invention is provided in the context of a query which is aquestion, although the present invention is not limited to queries whichare questions. It is appreciated that some, most or all of thefunctionality of the present invention may be carried out by a singlecomputer, which may be the client computer 100. Such a single-computerembodiment is not presently believed to be the preferred embodiment ofthe invention and accordingly, the invention is described herein in amulti-computer environment.

The question is normalized, typically by query processing server 110(FIG. 1). Normalization takes place based on a predefined set ofnormalization rules, which can be, for example, hard-coded or stored ina look-up table. A preferred set of normalization rules appear in Table1.

TABLE 1 Initial phrase Normalized phrase which What whats what is what'swhat is whens when is when's when is how many people live in what is thepopulation of how many people are in what is the population of how manypeople are there in what is the population of people live in populationwhat nationality where was born how rich is how much money what monthWhen what year When what day When explain the what is the explain whatis color is color of colour is colour of what fraction what percent whatnationality where was born how rich is how much money how much is a howmuch is a cost how tall how tall height Date of birth born Long, livelifespan life span lifespan can you explain what is could you explainwhat is what is the reason why How far is what is the distance to Howfar away is what is the distance to color color brain boost brainboosthow old is when was born what happens when why does what happens why howbig is what is the area of percentage percent world war two world war IIworld war three world war III this year 2005 next year 2006 brain boostbrainboost how old is when was born how wide how wide width how deep howdeep depth

Preferably, the normalization rules are formulated in order to providestandardization which enhances the efficiency of the methodology of thepresent invention.

Examples of operation of normalization functionality include conversionof:

“what's” to —what is—;“people live in” to —population—;“how come” to —why—; and“what year”, “what month” and “what day” to —when—.

Queries which are not formulated by the user in question syntax areconverted to question syntax. For example:

“CAPITAL OF MASSACHUSETTS” is converted to —WHAT IS THE CAPITAL OFMASSACHUSETTS—.“LENGTH OF BROOKLYN BRIDGE” is converted to —WHAT IS THE LENGTH OF THEBROOKLYN BRIDGE—

In the example of FIG. 1, the input question “HOW COME MARS IS RED” isconverted to —WHY IS MARS RED?—

In accordance with a preferred embodiment of the present invention,question normalization also preferably includes synonym expansion and/orreplacement. Preferably synonym expansion and/or replacement employssynonym retrieving functionality, preferably provided by server 110. Thesynonym retrieving functionality is preferably operative in response toquestions to generate document search terms including at least oneadditional search term not present in the question and to generate theat least one additional search term by replacing at least one word inthe question by at least one selected synonym thereof. In accordancewith a preferred embodiment of the present invention, the synonymretrieving functionality is operative to identify the at least oneselected synonym at least partially by reference to a word in thequestion other than the at least one word which is replaced by thesynonym. The at least one additional search term may be employed inplace of or in addition to the search term defined by the question.

Preferably, the synonym retrieving functionality is operative toidentify the selected synonym by identifying a plurality of synonyms andselecting at least one of the plurality of synonyms for which thereexists a phrase relevant to the question in a corpus.

In accordance with a preferred embodiment of the present invention thesynonym retrieving functionality is operative to identify the selectedsynonym by:

-   -   Searching a corpus for occurrences of at least one of the        plurality of synonyms for which there exists a phrase relevant        to the question; and

designating at least one synonym as a selected synonym in accordancewith the number of occurrences in the corpus of a phrase including thesynonym which is relevant to the question.

In accordance with an additional embodiment of the invention, thesynonym generation functionality described hereinabove may have acontext-based thesaurus application which could be outside of thecontext of document searching. In such an embodiment, there is providedcomputerized synonym generating functionality which is operative for:

receiving a stream of words;

employing a computer for generating a list of synonyms for at least oneword in the stream of words;

employing a computer for searching a corpus for synonym-containingphrases including synonyms in the list of synonyms together with atleast part of the stream of words;

employing a computer for evaluating the frequency of occurrence of eachof the synonym-containing phrases; and

proposing at least one selected synonym which forms part of asynonym-containing phrase having a relatively high frequency ofoccurrence in the corpus.

Preferably the synonym generating functionality is also operative for:

employing a computer for searching the corpus for received phrasesincluding the at least one word together with the at least part of thestream of words;

employing a computer for comparing the frequency of occurrence of thereceived phrases in the corpus as compared with the frequency ofoccurrence of the synonym-containing phrases; and

proposing at least one selected synonym which forms part of asynonym-containing phrase only if the frequency of occurrence of thesynonym-containing phrase exceeds the frequency of occurrence of thereceived phrase.

Following question normalization, the results of the normalizationfunctionality undergo question classification. Question classificationfunctionality is operative to attempt to classify the question into atleast one of a predetermined set of categories based on a predefined setof classification rules, which can be, for example, hard-coded or storedin a look-up table. A preferred set of classification rules appears inTable 2. It is appreciated that some questions do not fall into any oneof the predetermined set of classification categories.

Examples of classification categories include:

Questions relating to date such as:

“WHEN WAS GROVER CLEVELAND BORN?”

Questions relating to length such as:

“HOW LONG IS THE MISSISSIPPI RIVER?”

Questions relating to color such as:

“WHAT COLOR IS NEPTUNE?”

TABLE 2 Question Classification words how large length how big lengthhow small length how high length what diameter length how parsecs lengthhow light years length how m length how millimeters length howmillimeter length how mm length how inches length how inch length howcentimeters length how centimeter length how cm length how meters lengthhow meter length how kilometers length how kilometer length how kmhlength how feet length how foot length how ft length how yards lengthhow yard length how yd length how miles length how mile length how milength how mph length how k/m length how deep length how short lengthhow tall length how taller length how large length how big length howsmall length how high length what diameter length how parsecs length howlight years length how m length how millimeters length how wide lengthhow shorter length how wider length how fast length how thick length howfaster length what distance length how distance length what velocitylength what depth length what length length what height length whatwidth length what speed length what airspeed length what size lengthwhat area of length what elevation length what radius length whataltitude length what thickness length how wide length how shorter lengthhow wider length how fast length how thick length how faster length whatdistance length how distance length what velocity length what depthlength what length length what height length what width length whatspeed length what airspeed length what size length what area of lengthwhat elevation length what radius length what altitude length whatthickness length how wide length how shorter length how wider length howfast length how old numeric how many numeric how much numeric Lifespannumeric population numeric what planets planet what moons planet whatplanet planet what moon planet how old numeric how many numeric how muchnumeric what state matter matter what state state what states state whatocean ocean how big big how large big What phone number phone Whattelephone number phone what time time what hour time what hours timewhat organ organ what percent percent what percentage percent whatcountry country what countries country what nation country what nationscountry which country country what color color what colors color howmuch time duration how often duration how long long what length long howfar long how close long how farther long how longer long how gramsweight how kilograms weight how kilogram weight how kg weight how tonnesweight how ounces weight how ounce weight how oz weight how poundsweight how pound weight how lbs weight how lb weight how weigh weighthow heavy weight how heavier weight how light weight how lighter weighthow much payload weight what weigh weight what atomic weight numericwhat weight weight what mass weight what density weight How millilitersvolume How milliliter volume How ml volume How liters volume How litervolume How pints volume How pint volume How pt volume How quarts volumeHow quart volume How qt volume How gallons volume How gallon volume Howgal volume How teaspoons volume How teaspoon volume How tsp volume Howtablespoons volume How tablespoon volume How tbsp volume how hottemperature how cold temperature how degrees temperature how degreetemperature what temperature temperature how much pay money how muchcost money how much money money how much spend money how much sold moneyhow muchjpay money how much worth money how much profit money what pricemoney what cost money what worth money what monetary value money Whendate what date date what day date what month date what year date whatbirthday date what birthdate date what frequency frequency who was thewho2 who is the who2 who is Who2 Who who2 what is define Lifespannumeric

Following question classification, the normalized question, which may ormay not be classified in one or more predetermined category, is employedfor expected answer generation. Expected answer generation functionalityis operative to generate expected answers to a normalized question basedon a predefined set of expected answer generation rules, which can be,for example, hard-coded or stored in a look-up table.

Expected answer generation functionality reformats a normalized questioninto answer syntax likely to appear in the correct answer to thequestion. The expected answer generation rules preferably includesubstantially all verbs in a relevant language (e.g., English) as wellas predefined conjugation rules. For example, where the phrase “why is”appears, the word “why” is removed, the word “is” is inserted before thelast word of the query and the word “because” is added at the end of theentire string. As another example, where the phrase “why did” appears,the word “why” is removed and the verb is converted into the past tense.

For example, the question: WHEN WAS JOHN DOE BORN? is reformatted to—JOHN DOE WAS BORN ON . . . —

As a further example, the question: WHY DID THE VOLCANO ERUPT? isreformatted to —THE VOLCANO ERUPTED BECAUSE . . . —

In the example referenced in FIG. 1, the normalized question: WHY ISMARS RED? is reformatted to —MARS IS RED BECAUSE . . . —

Following expected answer generation, the expected answer undergoes nounextraction. Noun extraction is preferably carried out by initiallytagging parts of speech in the expected answer, using a conventionalpart of speech tagger, such as the Brill Tagger, which is accessible,for example on www.cs jhu.edu/˜brill.

The noun extraction functionality then extracts all of the nouns in theexpected answer.

In the example of FIG. 1, the extracted nouns are: MARS & RED.

Following noun extraction, the extracted nouns and the expected answerare supplied to preliminary search engine query generationfunctionality, which generates preliminary search engine queries basedon the expected answer. Preliminary search engine query generationfunctionality preferably generates multiple preliminary search enginequeries, typically four in number, in accordance with the followingrules:

1. The expected answer received from expected answer generationfunctionality constitutes one of the preliminary search engine queries.

In the example of FIG. 1: “MARS IS RED BECAUSE OF”

2. A further preliminary search engine query is generated by removingstopwords from the beginning and end of the expected answer.

In the example of FIG. 1: “MARS IS RED BECAUSE”

3. An additional preliminary search engine query is generated byremoving all of the stopwords from the expected answer.

In the example of FIG. 1: MARS+RED+BECAUSE

4. A further preliminary search engine query is generated by retainingonly the nouns in the expected answer.

In the example of FIG. 1: MARS+RED

The preliminary search engine queries are then enhanced by previousanswer-derived search term concatenation. Previous answer-derived searchterm concatenation generates at least one additional search term, notpresent in the question, based on at least one previous answer receivedby previous answer retrieval server 112 from a previous answer database,in response to the input question. The previous answer was earlierprovided by query processing server 110 in response to an earlierrelevant question, prior to receipt of the current question from theuser.

In accordance with a preferred embodiment of the present invention,previous answer-derived search term concatenation is carried out byserver 110 (FIG. 1), which concatenates the preliminary search enginequeries with the outputs of server 112, thus providing a plurality ofconcatenated search engine queries based on the preliminary searchengine queries with the addition of previous answer-derived searchterms.

In the example of FIG. 1, where the preliminary search engine queriesare:

“MARS IS RED BECAUSE OF”; “MARS IS RED BECAUSE”; MARS+RED+BECAUSE; andMARS+RED

and the previous answer-derived search terms are: IRON OXIDE, RUST andIRON,the concatenated search engine queries are preferably:

“MARS IS RED BECAUSE OF”+“IRON OXIDE”+RUST+IRON; “MARS IS REDBECAUSE”+“IRON OXIDE”+RUST+IRON; MARS+RED+BECAUSE+“IRONOXIDE”+RUST+IRON; and MARS+RED+“IRON OXIDE”+RUST+IRON.

The concatenated search engine queries are preferably employed toperform a document retrieval web search, typically initiated by server110 (FIG. 1) communicating via a network, such as the Internet, withconventional search engine server 120 (FIG. 1), such as an Answers.com™,GOOGLE® or YAHOO® server. Alternatively any other suitable search enginemay be used to search specific domains of documents, such as newsdocuments, business related documents and science related documents.

Searches of specific document domains may be manually or automaticallyactuated. In accordance with a preferred embodiment of the presentinvention, automatic actuation of a search in a specific document domainmay be realized by comparing a query with trigger words which are highlyspecific to a specific document domain. For example, inquiries regarding“tsunami” can be directed automatically to a specific news documentdomain search engine, should the term “tsunami” be flagged as a currentevent item. Flagging of a current event item may be carried out manuallyor automatically by query processing server 110.

The search engine server 120 typically provides search results to server110 in the form of links to relevant documents and summaries of thosedocuments.

In the example of FIG. 1, the following typical links may be among thelinks supplied to server 110:

http://solarsystem.nasa.gov/planets/profile.cfm?Object=Mars&Display=Kidshttp://schools.mukilteo.wednet.edu/me/staff/bullocksk/FQA/why_is_red.htm

The documents, such as HTML, WORD, XML and PDF documents, identified bythe links, are automatically and concurrently downloaded.

Each retrieved document is preferably processed by answer extractionfunctionality, which is now described with reference to FIG. 3 withreference to an HTML document. It is appreciated that other types ofdocuments can be processed in a suitably similar manner.

As an initial step in answer extraction, the HTML document is subject toHTML scrubbing wherein the HTML document is converted to a text documentby removing the HTML tags in a conventional manner.

Following HTML scrubbing, named entity expansion of the text documenttakes place.

In conceptual terms, named entity expansion involves the followingfunctionality:

Enhancing a retrieved document by identifying capitalized phrases whichappear in the document, identifying designated capitalized wordsbelonging to the capitalized phrases and adding to the document adjacenteach designated capitalized word that does not appear in a capitalizedphrase, the designated capitalized word that does appear alongsidethereof elsewhere in the document in a capitalized phrase; and

Carrying out analysis of the enhanced document in order to identify atleast one portion thereof as a potential answer.

In accordance with a preferred embodiment of the present invention, allthe proper nouns and proper noun containing phrases in the text documentare identified. All such proper nouns and proper noun containing phrasesin the text document are expanded into the largest noun phrase form thatappears in the text. This is particularly useful in situations where thetext contains an abbreviation of a proper noun, such as a person's nameor the name of a place.

For example, if “Planet Mars”, “Mars”, “Red Planet”, “Red Planet Mars”,and “Red Mars” all appear in the text document, the shorter forms areall expanded to read: “Red Planet Mars”.

Preferably, the named entity expansion functionality carries out thefollowing steps in software:

Step 1—Proper nouns and phrases containing proper nouns are extracted byexecuting a regular expression (([A-Z][\w|,]+\s)+) which extracts allcapitalized words and phrases. Regular expressions of this type are wellknown in the art of computer programming.

Step 2—In order to reduce incorrect results, extracted proper nouns andphrases containing proper nouns having words that are all capitalized orhaving a total length greater than 75 characters in length are ignored.

Step 3—The extracted phrases are collected in an initial list.

Step 4—The largest entry corresponding to each entry, which is entirelycontained in a larger entry, is identified.

Step 5—Entries in the initial list are expanded by replacing entrieswhich are entirely contained in a larger entry, by the largest entry,thereby defining a “largest entries list”.

For example, for an initial list containing the following entries:

“Planet Mars”, “Mars”, “Red Planet”, “Red Planet Mars”, “Earth”, “Venus”and “Red Mars”,

the largest entries list preferably contains the following entries:

“Red Planet Mars”, “Red Planet Mars”, “Red Planet Mars”, “Red PlanetMars”, “Earth”, “Venus” and “Red Planet Mars”.

Using the initial list and the largest entries list, the named entityexpansion functionality modifies the text document by replacing allproper nouns and phrases containing proper nouns in the initial listwith the corresponding largest proper noun phrase appearing in thelargest entries list.

Following named entity expansion, the modified text document undergoestheme extraction, providing a list of words ranked by their frequency ofoccurrence.

In conceptual terms, theme extraction utilizes statistical analysis ofthe frequency of occurrence of words in the modified text document toidentify at least one theme word of the document, which theme word mayor may not be a question keyword. Theme extraction enables answers tothe question to be found in text which does not contain a questionkeyword.

For example, if in response to a question such as “HOW MUCH HORSEPOWERIN A MERCEDES S500?”, there is found a modified text document containinga sentence “THE 2000 S500 IS POWERED BY A 5.0—LITER V8 PUMPING OUT 302HORSEPOWER”, theme extraction identifies the sentence as an answer tothe question, notwithstanding that the word Mercedes does not appeartherein. As will be described hereinbelow, theme extraction examines themodified text document and notes that it relates to Mercedes and thusassumes that the above sentence refers to a Mercedes S500 vehicle.

Theme extraction preferably includes the following steps:

Step 1—All non-alphanumeric characters are removed from the modifiedtext document, preferably by replacing matches of the following regularexpression with spaces:

Step 2—The resulting document is then rendered into a list of words.

Step 3—The following words are then removed from the list of words:

-   -   Stopwords—Examples are: “the”, “and” & “but”    -   Common words, which appear very often in the English language.        These words are ignored since they probably have little        significance to the overall document. Examples are: “because”,        “teach”, “take”, “speak”, “simply” & “select”.    -   Words less than three characters in length.

Preferably numbers are not removed.

Step 4—The remaining words in the list are stemmed to their roots,preferably using known stemming algorithms, such as the well-knownPorter-stemming algorithm. A list of stemmed words is formed.

Step 5—An occurrence frequency score is generated for every differentword in the list of stemmed words, the occurrence frequency scoreindicating the occurrence of the word in the modified text document.

Step 6—Using the occurrence frequency score and knowing the number ofdifferent words in the modified text document, an average wordoccurrence frequency is calculated for the document. Alternatively amedian word occurrence frequency may be provided.

For example, if the initial document contains the following text:

“Mars is fourth from the Sun. It is sometimes called the ‘Red Planet’etc. because of the color of its soil. The soil on the Red Planet is redbecause much of the soil contains iron oxide (rust). Exploring Mars is adifficult, but worthwhile task. However there are many interestingthings to see and learn. Olympus Mons may be the largest volcano in oursolar system. It is three times taller than the tallest mountain onEarth, Mt. Everest.”

Following Named Entity Expansion, the modified text document containsthe following text in which the expanded named entities are underlinedhere for the sake of clarity:

“Red Planet Mars is fourth from the Sun. It is sometimes called the ‘RedPlanet Mars’ etc. because of the color of its soil. The soil on the RedPlanet Mars is red because much of the soil contains iron oxide (rust).Exploring Red Planet Mars is a difficult, but worthwhile task. Howeverthere are many interesting things to see and learn. Olympus Mons may bethe largest volcano in our solar system. It is three times taller thanthe tallest mountain on Earth, Mt. Everest.”

Following step 3 described hereinabove, the list of words is: “Red”,“Planet”, “Mars”, “fourth”, “sun”, “Red”, “Planet”, “Mars”, “etc.”,“color”, “soil”, “soil”, “Red”, “Planet”, “Mars”, “red”, “soil”, “iron”,“oxide”, “rust”, “Red”, “Planet”, “Mars”, “difficult”, “worthwhile”,“task”, “interesting”, “Olympus”, “Mons”, “largest”, “volcano”, “solar”,“system”, “three”, “mountain”, “Earth”, “Everest”.

Following stemming as described in step 4, the list of stemmed words is:

“Red”, “Planet”, “Mars”, “four”, “planet”, “sun”, “Red”, “Planet”,“Mars”, “etc.”, “color”, “soil”, “soil”, “Red”, “Planet”, “Mars”, “red”,“soil”, “iron”, “oxide”, “rust”, “Red”, “Planet”, “Mars”, “difficult”,“worthwhile”, “task”, “interest”, “Olympus”, “Mons”, “large”, “volcano”,“solar”, “system”, “three”, “mountain”, “Earth”, “Everest”.

The occurrence frequency score for each of the words in the list is:

“red”—5

“planet”—4

“Mars”—4

“four”—1

“sun”—1

“etc.”—1

“color”—1

“soil”—3

“iron”—1

“oxide”—1

“rust”—1

“difficult”—1

“worthwhile”—1

“task”—1

“interest”—1

“Olympus”—1

“Mons”—1

“large”—1

“volcano”—1

“solar”—1

“system”—1

“three”—1

“mountain”—1

“earth”—1

“Everest”—1

The average word occurrence frequency for this document is 1.48.

Preferably, all words having occurrence frequencies which are less thantwo times the average word occurrence frequency are discarded.

In the above example, the remaining word list is:

“red”—5

“planet”—4

“Mars”—4

“soil”—3

A second average word occurrence frequency is calculated for theremaining words. In the above example the second average word occurrencefrequency is 4.

Words having occurrence frequencies that are equal to or greater thanthe second average word occurrence frequency are defined to be “ThemeWords”.

The Theme Words are then arranged in the order of their occurrencefrequencies in a list, termed a Theme Word List.

For the above example, the Theme Word List preferably appears as:

“red”, “planet”, “Mars”.

Following theme extraction, sentence segmentation takes place bybreaking the modified text document into sentences by identifyingperiods while ignoring periods which are associated with commonabbreviations. Examples of such common abbreviations having periods are“Mrs.”, “Mr.”, “Ltd.”, “etc.”, “Corp.” and “Atty.”.

In the above example, the Modified Text Document is:

“Red Planet Mars is fourth from the Sun. It is sometimes called the ‘RedPlanet Mars’ etc. because of the color of its soil. The soil on the RedPlanet Mars is red because much of the soil contains iron oxide (rust).Exploring Red Planet Mars is a difficult, but worthwhile task. Howeverthere are many interesting things to see and learn. Olympus Mons may bethe largest volcano in our solar system. It is three times taller thanthe tallest mountain on Earth, Mt. Everest.”

Following sentence segmentation, the document appears as follows:

Sentence 1—Red Planet Mars is fourth from the Sun.Sentence 2—It is sometimes called the ‘Red Planet Mars’ etc. because ofthe color of its soil.Sentence 3—The soil on the Red Planet Mars is red because much of thesoil contains iron oxide (rust).Sentence 4—Exploring Red Planet Mars is a difficult, but worthwhiletask.Sentence 5—However there are many interesting things to see and learn.Sentence 6—Olympus Mons may be the largest volcano in our solar system.Sentence 7—It is three times taller than the tallest mountain on Earth,Mt. Everest.

Following sentence segmentation, contiguous sentence stitching isperformed. Contiguous sentence stitching joins related contiguoussentences into related sentence units. Preferably contiguous sentencestitching is carried out by the following series of steps:

Step 1—The document is received in the form of a list of sentences.

Step 2—Working in reverse order, starting with the last sentence, thefirst word of each sentence is checked to determine whether it is ajoining word

Step 3—If the first word of the sentence is a joining word, thatsentence is appended to the end of the preceding sentence as a singlerelated sentence unit.

Preferably, the first word in each sentence may or may not be identifiedas a joining word by consulting a look-up-table. Examples of joiningwords are some pronouns, such as “he”, “she” and “it” and words whichindicate a time sequence, such as, for example: “before,” “after,”“beforehand,” and “afterwards”.

Referring to the preceding example, contiguous sentence stitchingpreferably converts the above-listed seven sentences into four relatedsentence units, preferably as follows:

1—Red Planet Mars is fourth from the Sun. It is sometimes called the‘Red Planet Mars’ etc. because of the color of its soil.2—The soil on the Red Planet Mars is red because much of the soilcontains iron oxide (rust).3—Exploring Red Planet Mars is a difficult, but worthwhile task. Howeverthere are many interesting things to see and learn.4—Olympus Mons may be the largest volcano in our solar system. It isthree times taller than the tallest mountain on Earth, Mt. Everest.

In accordance with a preferred embodiment of the invention, potentialanswer filtering is performed on all of the related sentence units.Potential answer filtering is preferably effected by comparing each ofthe related sentence units with each of the phrases in concatenatedsearch engine queries containing a phrase and classifying each of therelated sentence units as to whether it contains the phrase in aconcatenated search engine query.

If a related sentence unit is found to contain the phrase in aconcatenated search engine query and if the concatenated search enginequery was derived from a question which is within one of theclassification categories, the related sentence unit is examined todetermine whether it contains a classification word which is appropriateto that category.

For example if the question was classified into a date category, therelated sentence unit is examined to ensure that it contains a date.

Thereafter, the proximity between the phrase and the date in the relatedsentence unit is examined. Typically if there are more than apredetermined number of characters, for example 85 characters, betweenthe phrase and the date, the related sentence unit is not considered tobe a potential answer.

As another example, if the question was classified into a numericalanswer category, such as a length category, the related sentence unit isexamined to determine whether a number is present, either in digits orwords.

In the present example, the phrase “MARS IS RED BECAUSE” appears in theconcatenated search engine query generated according to rule 2 and alsoappears in related sentence unit 2—“The soil on the Red Planet Mars isred because much of the soil contains iron oxide (rust).”

If a potential answer is not identified by this stage, a noun questionkeyword based search of the related sentence units takes place,preferably employing the concatenated search engine query made up ofnoun question keywords, which was generated in accordance with rule 4 ofthe Preliminary Search Engine Query Generation rules describedhereinabove, such as MARS+RED+“IRON OXIDE”+IRON+RUST.

If noun question keywords are found in multiple related sentence units,the noun question keyword containing related sentence units are rankedin accordance with the number of noun question keywords found.

In the present example, results of a noun question keyword search of therelated sentence units produces the underlined results and rankings:

1—Red Planet Mars is fourth from the Sun. It is sometimes called the‘Red Planet Mars’ etc. because of the color of its soil. Ranking—22—The soil on the Red Planet Mars is red because much of the soilcontains iron oxide (rust?. Ranking—43—Exploring Red Planet Mars is a difficult, but worthwhile task. Howeverthere are many interesting things to see and learn. Ranking—24—Olympus Mons may be the largest volcano in our solar system. It isthree times taller than the tallest mountain on Earth, Mt. Everest.Ranking—0

A question keyword based search of the related sentence units now takesplace, preferably employing the concatenated search engine query made upof question keywords, which was generated in accordance with rule 3 ofthe Preliminary Search Engine Query Generation rules describedhereinabove, such as MARS+RED+BECAUSE+“IRON OXIDE”+IRON+RUST.

If question keywords are found in multiple related sentence units, thequestion keyword containing related sentence units are ranked inaccordance with the number of question keywords found.

In the present example, results of a question keyword search of therelated sentence units produces the underlined results and rankings:

1—Red Planet Mars is fourth from the Sun. It is sometimes called the‘Red Planet Mars’ etc. because of the color of its soil. Ranking—32—The soil on the Red Planet Mars is red because much of the soilcontains iron oxide (rust). Ranking—53—Exploring Red Planet Mars is a difficult, but worthwhile task. Howeverthere are many interesting things to see and learn. Ranting—24—Olympus Mons may be the largest volcano in our solar system. It isthree times taller than the tallest mountain on Earth, Mt. Everest.Ranking—0

The ranked question keyword containing related sentence units are thenreranked in order to take into account questions keywords which do notappear in a given ranked related sentence unit but which do appear astheme words of the modified text document.

In the present example employing a noun question keyword search, resultsof reranking produces the following ranking. Theme words which are notquestion keywords are indicated by italics:

1—Red Planet Mars is fourth from the Sun. It is sometimes called the‘Red Planet Mars’ etc. because of the color of its soil. Ranking—3

2—The soil on the Red Planet Mars is red because much of the soilcontains iron oxide (rust). Ranking—53—Exploring Red Planet Mars is a difficult, but worthwhile task. Howeverthere are many interesting things to see and learn. Raking—34—Olympus Mons may be the largest volcano in our solar system. It isthree times taller than the tallest mountain on Earth, Mt. Everest.Raning—0

The ranked question keyword-containing related sentence units are thenexamined as follows:

If a ranked related sentence unit is found to contain a question keywordin a concatenated search engine query and if the concatenated searchengine query was derived from a question which is within one of theclassification categories, the ranked related sentence unit is examinedto determine whether it contains a classification word which isappropriate to that category.

For example, if the question was classified into a date category, theranked related sentence unit is examined to ensure that it contains adate.

Thereafter, the proximity between a question keyword and the date in therelated sentence unit is examined. Typically, if there are more than apredetermined number of characters, for example 85 characters, betweenthe question keyword and the date, the ranked related sentence unit isnot considered to be a potential answer.

As another example, if the question was classified into a numericalanswer category, such as a length category, the related sentence unit isexamined to determine whether a number is present, either in digits orwords.

Preferably, only the related sentence unit or units having the highestranking are retained.

In the present example employing a noun question keyword search, thefollowing related sentence units, having the highest ranking areretained:

2—The soil on the Red Planet Mars is red because much of the soilcontains iron oxide (rust). Ranking—5

It is a particular feature of the present invention that preferably therelated sentence unit or units are then ranked on the basis of thenumber of question keywords appearing in a sentence or sentencescorresponding thereto in the text document upstream of named entityexpansion. Only the related sentence unit or units having the highestranking are retained and are considered to be potential answers.

In the present example, related sentence unit 2 is retained, the wordMars is ignored and the related sentence unit 2 is reranked withouttaking into account the word Mars, which did not appear in the initialtext document.

2—The soil on the Red Planet Mars is red because much of the soilcontains iron oxide (rust). Ranking—4

The potential answers are then scored in accordance with the concisenessof the appearance of question keywords therein, and ranked in accordancewith the score. This is achieved by examining each of the potentialanswers and determining the proximity between the question keywordstherein. This examination preferably includes the following steps:

Step 1—Removal of stop words and all non-alphanumeric characters fromeach potential answer to provide a skeleton potential answer.

In the present example, the skeleton potential answers are:

2—soil Red Planet Mars red because soil contains iron oxide rust

Step 2—Noting the position of the question keywords in the skeletonpotential answer;

In the present example, the positions are indicated in parenthesesalongside each question keyword as follows:

2—soil Red Planet Mars(17) red(22) because soil contains iron oxide rust

Step 3—Calculating the average distance in characters of the questionkeywords from the beginning of the skeleton potential answer.

In the present example

2. Average distance=(17+22)/2=19.5

Step 4—Noting, for each different question keyword, the differencebetween the average distance and the location of the question keywordwhich is closest to the average distance.

In the present example:

2. For MARS, the difference is 19.5−17=2.5; for RED, the difference is22−19.5=2.5

Step 5—Noting, for each potential answer, the spread between thedifference of the question keyword having the greatest difference andthe difference of the question keyword having the smallest difference.

For a case in which the difference of the question keyword having thegreatest difference is equal to the difference of the question keywordhaving the smallest difference and the spread is zero, the spread isdefined to be the difference of the question keyword having the greatestdifference from the average.

In the present example:

2. Spread=2.5

The conciseness score which indicates the conciseness of the appearanceof question keywords is defined to be the value of the spread. Rankingof the potential answers is a negative function of the score, such thata potential answer having a smaller score will be ranked higher.

For each document, the potential answers, each having a correspondingquestion keyword conciseness score, are supplied to answer rankingfunctionality (FIG. 2).

Answer ranking takes all of the potential answers from all of themodified text documents and generates a set of “best” answers. Theanswer ranking functionality preferably is operative for evaluating eachof the potential answers according to at least one of the followingcriteria:

proximity of question keywords in the potential answer;

proximity of classification words and nouns in the potential answer; and

word count of at least part of the potential answer.

In accordance with a preferred embodiment of the invention, “best”answer filtering is performed on all of the potential answers. “Best”answer filtering is effected preferably by comparing each of thepotential answers with each of the concatenated search engine queriesthat is a phrase and classifying each of the potential answers as towhether it contains the phrase in the concatenated search engine querydefined by rule 1 above and possibly the phrase in the concatenatedsearch engine query defined by rule 2 above.

If a predetermined number of “best” answers, preferably three, eachcontaining the phrase in the concatenated search engine query defined byrule 1 above are found, then all potential answers not containing thephrase in the concatenated search engine query defined by rule 1 arediscarded.

If a predetermined number of “best” answers, preferably two, eachcontaining the phrase in the concatenated search engine query defined byrule 2 above are found, then all potential answers not containing thephrase in the concatenated search engine query defined by rule 1 or thephrase in the concatenated search engine query defined by rule 2 arediscarded.

If neither of the above two conditions is fulfilled, a noun questionkeyword based search of the potential answers takes place, preferablyemploying the concatenated search engine query made up of noun questionkeywords, which was generated in accordance with rule 4 of thePreliminary Search Engine Query Generation rules described hereinabove,in a manner similar to that described hereinabove with reference topotential answer filtering in FIG. 3.

If noun question keywords are found in multiple potential answers, thenoun question keyword containing potential answers are ranked inaccordance with the number of noun question keywords found.

The potential answer or answers having the highest ranking are retainedand are considered to be “best answers” and all other potential answersare discarded.

If a “best” answer is not identified by this stage, a question keywordbased search of the potential answers takes place, preferably employingthe concatenated search engine query made up of question keywords, whichwas generated in accordance with rule 3 of the Preliminary Search EngineQuery Generation rules described hereinabove, in a manner similar tothat described hereinabove with reference to potential answer filteringin FIG. 3.

If question keywords are found in multiple potential answers, thequestion keyword containing potential answers are ranked in accordancewith the number of question keywords found. The potential answer oranswers having the highest ranking are retained and all other potentialanswers are discarded.

A conciseness/proximity score is now calculated for each potentialanswer. The conciseness/proximity score preferably is based on theaverage of the following three metrics:

1. Question keyword conciseness score as calculated by potential answerfiltering functionality as described hereinabove with reference to FIG.3;2. Noun-classification word distance, which is the shortest distance,expressed in number of characters, between a classification word and anoun within the potential answer. If the potential answer does notbelong to any of the classification words, this distance is defined tobe zero.For example, if the question was “HOW FAR IS MARS FROM EARTH” theclassification would be LENGTH. If the answer was “MARS IS 35 MILLIONMILES AWAY FROM EARTH” then this score would be the distance between theword “Mars” and the length measurement “miles”, which is a distance of19 characters.3. Average proximity to the beginning of each potential answer of thefirst occurrence of each question keyword. To calculate this, theposition of the first occurrence of each different question keyword issummed and divided by the number of different question keywords.In the example brought above, the distance of each question keyword fromthe beginning of the potential answer is shown in parentheses, and theaverage proximity is indicated.2—The soil on the Red(17) Planet is red because much of the soilcontains iron oxide (rust). Average proximity=17/1=17.In this example, the conciseness/proximity score of each of thepotential answers is:

2−(2.5+0+17)/3=6.5

If the conciseness/proximity score of a potential answer is greater thana predetermined number, preferably 80, the potential answer isdiscarded.

The remaining potential answers are preferably stitched together to forma potential answer document. The potential answer document undergoestheme extraction, providing a list of potential answer words ranked bytheir frequency of occurrence in the potential answer document.

In conceptual terms, theme extraction utilizes statistical analysis ofthe frequency of occurrence of words in the potential answer document toidentify at least one theme word of the potential answer document.

Potential answer theme extraction preferably includes the followingsteps:

Step 1—All non-alphanumeric characters are removed from the potentialanswer document, preferably by replacing matches of the followingregular expression with spaces:

Step 2—The resulting document is then rendered into a list of potentialanswer words.

Step 3—The following words are then removed from the list of words:

-   -   Stopwords—Examples are: “the”, “and” & “but”    -   Common words, which appear very often in the English language.        These words are ignored since they probably have little        significance to the overall document. Examples are: “because”,        “teach”, “take”, “speak”, “simply” & “select”.    -   Words less than three characters in length.

Preferably numbers are not removed.

Step 4—The remaining potential answer words in the list are stemmed totheir roots, preferably using known stemming algorithms, such as thewell-known Porter-stemming algorithm.

Step 5—An occurrence frequency score is generated for every differentpotential answer word in the list indicating the occurrence of thepotential answer word in the potential answer document.

Step 6—Using the occurrence frequency score and knowing the number ofdifferent potential answer words in the potential answer document, anaverage potential answer word occurrence frequency is calculated for thepotential answer document. Alternatively a median potential answer wordoccurrence frequency may be provided.

Preferably, all potential answer words having occurrence frequencieswhich are less than two times the average potential answer wordoccurrence frequency are discarded

A second average potential answer word occurrence frequency iscalculated for the remaining potential answer words. Potential answerwords having occurrence frequencies that are equal to or greater thanthe second average potential answer word occurrence frequency aredefined to be “Potential Answer Theme Words”.

The Potential Answer Theme Words are then arranged in the order of theiroccurrence frequencies in a list, termed a Potential Answer Theme WordList.

Potential answers which do not contain Potential Answer Theme Words arediscarded. The remaining potential answers are considered to be “bestanswers” and are ordered in accordance with increasing length, such thatthe most concise answers are presented first.

If no Potential Answer Theme Words are found, the remaining potentialanswers are ordered in accordance with their conciseness/proximityscore.

The potential answers are preferably presented to the user, where thepotential answers having the lowest conciseness/proximity score arepresented first.

Preferably all Potential Answer Theme Words are stored in the PreviousAnswer Database (FIG. 2) for future use, thus enhancing futureoperation. Previously asked questions which contain Potential AnswerTheme Words may be so classified in the Previous Answer Database.

In accordance with an alternative embodiment of the invention, prior todownloading all of the documents found in the Document. Retrieval WebSearch stage (FIG. 2), only summaries of the documents are downloadedfrom the search engine server 120 (FIG. 1). These summaries arepreferably stitched into a Document Summary Document and themeextraction (FIG. 3) is performed thereon to obtain Summary Theme Words.The document summaries found in the Document Retrieval Web Search arethen examined to determine whether they contain the Summary Theme Words.Only documents whose summaries contain at least one Summary Theme Wordare downloaded and processed by the answer extraction and answer rankingfunctionalities (FIG. 2).

Reference is now made to FIG. 4, which is a simplified illustration of atypical question generating functionality operative in accordance with apreferred embodiment of the present invention. As seen in FIG. 4, auser, operating a client computer 400, employs a conventional webbrowser, such as Microsoft® Internet Explorer®, to access a web page 402containing a text, and preferably containing a button 404 which enablesquestion generation. The user presses the button 404 in order togenerate at least one question which is related to the subject of thedocument displayed by the browser.

Alternatively, any other suitable methodology may be employed forentering a question generation command, such as the use of a voiceresponsive input device, a screen scraping functionality, an emailfunctionality, an SMS functionality or an instant messagingfunctionality.

The request for question generation regarding the subject, including theweb page 402, is supplied, typically via the Internet, to aquestion-generating server 410. Server 410 then utilizes themeextraction functionality in order to identify theme words present in theweb page 402, and then supplies the theme words to a previously-askedquestion retrieval server 412.

Previously-asked question retrieval server 412 provides an output ofpreviously-asked questions which contain the theme words, or havingpreviously generated answers which contain the theme words, to questiongenerating server 410.

The retrieved questions may be combined and presented to the user in anysuitable format, such as in a text box 418 which is displayed bycomputer 400 adjacent web page 402.

Reference is now made to FIG. 5, which is a simplified flow chart of thequestion generating functionality of FIG. 4. As seen in FIG. 5, an inputdocument, such as web page 402 (FIG. 4), which is typically supplied bya user via computer 400 (FIG. 4), undergoes theme extraction by a themeextraction functionality of question generating server 410 (FIG. 4).

Theme extraction performed by the theme extraction functionalityprovides providing a list of words ranked by their frequency ofoccurrence in the input document.

In conceptual terms, theme extraction utilizes statistical analysis ofthe frequency of occurrence of words in the input document to identifyat least one theme word of the input document. Theme extraction enablesthe generation of questions related to the main topics of the document,and not to side aspects of the document.

Theme extraction preferably includes the following steps:

Step 1—All non-alphanumeric characters are removed from the modifiedtext document, preferably by replacing matches of the following regularexpression with spaces:

Step 2—The resulting document is then rendered into a list of words.

Step 3—The following words are then removed from the list of words:

-   -   Stopwords—Examples are: “the”, “and” & “but”    -   Common words, which appear very often in the English language.        These words are ignored since they probably have little        significance to the overall document. Examples are: “because”,        “teach”, “take”, “speal”, “simply” & “select”.    -   Words less than three characters in length.

Preferably numbers are not removed.

Step 4—The remaining words in the list are stemmed to their roots,preferably using known stemming algorithms, such as the well-knownPorter-stemming algorithm.

Step 5—An occurrence frequency score is generated for every differentword in the list indicating the occurrence of the word in the document.

Step 6—Using the occurrence frequency score and knowing the number ofdifferent words in the input document, an average word occurrencefrequency is calculated for the document. Alternatively a median wordoccurrence frequency may be provided.

For example, if the initial document contains the following text:

“Mars, in astronomy, 4th planet from the sun, with an orbit next inorder beyond that of the earth. Mars has a striking red appearance, andin its most favorable position for viewing, when it is opposite the sun,it is twice as bright as sirius, the brightest star. Mars has a diameterof 4,200 mi (6,800 km), just over half the diameter of the earth, andits mass is only 11% of the earth's mass. The planet has a very thinatmosphere consisting mainly of carbon dioxide, with some nitrogen andargon. Mars has an extreme day-to-night temperature range, resultingfrom its thin atmosphere, from about 80° F. (27° C.) at noon to about−100° F. (−73° C.) at midnight; however, the high daytime temperaturesare confined to less than 3 ft (1 m) above the surface.”

Following step 3 above, the list of words contains the following words:

“Mars”, “astronomy”, “4^(th)”, “planet”, “sun”, “orbit”, “order”,“beyond”, “earth”, “Mars”, “striking”, “red”, “appearance”, “favorable”,“position”, “viewing”, “opposite”, “sun”, “twice”, “bright”, “Sirius”,“brightest”, “star”, “Mars”, “diameter” “4,200”, “6,800”, “half”,“diameter”, “earth”, “mass”, “earths”, “mass”, “planet”, “thin”,“atmosphere”, “consisting”, “mainly”, “carbon”, “dioxide”, “nitrogen”,“argon”, “Mars”, “extreme”, “temperature”, “range”, “resulting”, “thin”,“atmosphere”, “noon”, “100”, “midnight”, “daytime”, “temperatures”,“confined”, “surface”.

Following step 4 above, the list of words contains the following words:

“Mars”, “astronomy”, “4^(th)”, “planet”, “sun”, “orbit”, “order”,“beyond”, “earth”, “Mars”, “strike”, “red”, “appear”, “favor”,“position”, “view”, “opposite”, “sun”, “twice”, “bright”, “Sirius”,“bright”, “star”, “Mars”, “diameter” “4,200”, “6,800”, “half”,“diameter”, “earth”, “mass”, “earth”, “mass”, “planet”, “thin”,“atmosphere”, “consist”, “main”, “carbon”, “dioxide”, “nitrogen”,“argon”, “Mars”, “extreme”, “temperature”, “range”, “result”, “thin”,“atmosphere”, “noon”, “100”, “midnight”, “daytime”, “temperature”,“confine”, “surface”.

Following step 5 above, the occurrence frequency score for each of thewords is:

“Mars”—4

“astronomy”—1

“planet”—2

“sun”—2

“orbit”—1

“order”—1

“beyond”—1

“earth”—3

“strike”—1

“red”—1

“appear”—1

“favor”—1

“position”—1

“view”—1

“opposite”—1

“twice”—1

“bright”—2

“Sirius”—1

“star”—1

“diameter”—2

“4,200”—1

“6,800”—1

“mass”—2

“thin”—2

“atmosphere”—2

“consist”—1

“main”—1

“carbon”—1

“dioxide”—1

“nitrogen”—1

“argon”—1

“extreme”—1

“temperature”—2

“range”—1

“result”—1

“noon”—1

“100”—1

“midnight”—1

“daytime”—1

“confine”—1

“surface”—1

The average word occurrence frequency is 1.3023

Preferably, all words having occurrence frequencies which are less thantwo times the average word occurrence frequency are discarded.

In the above example, the remaining list of words is:

“Mars”—4

“earth”—3

A second average word occurrence frequency is calculated for theremaining words. Words having occurrence frequencies that are equal toor greater than the second average word occurrence frequency are definedto be “Theme Words”.

-   -   The Theme Words are then arranged in the order of their        occurrence frequencies in a list, termed a Theme Word List.

In the above example, the second average word occurrence frequency is(4+3)/2=3.5 and therefore the theme word list consists of: “Mars”.

Following theme extraction, a previously-asked question retrievalfunctionality supplies resulting theme words to a previous questiondatabase for retrieval of previously asked questions related to thetheme words.

In accordance with a preferred embodiment of the present invention, thepreviously-asked question retrieval functionality compares the themewords to the questions and answers contained in the previously-askedquestions database, and retrieves questions containing the theme wordsor having previously generated answers containing the theme words.

For the preceding example, the previously-asked question retrievalfunctionality may retrieve questions such as:

“What is the fourth planet from the sun?”

“What is twice as bright as Sirius?”

“What color is Mars?”

The retrieved questions are preferably presented to the user, preferablyalongside the input document.

Reference is now made to FIG. 6, which is a simplified illustration of atypical report precursor generating methodology operative in accordancewith a preferred embodiment of the present invention. As seen in FIG. 6,a user operating a client computer 600, employs a conventional webbrowser, such as Microsoft® Internet Explorer®, to access a web formpage 602 containing a text box 603, and preferably containing a button604 which enables report precursor generation.

The user preferably types a desired report topic words into text box603, and then presses the button 604 in order to generate a reportprecursor which is related to the topic in text box 603.

Alternatively, any other suitable methodology may be employed forentering the report precursor topic, such as the use of a voiceresponsive input device, a screen scraping functionality, an emailfunctionality, an SMS functionality or an instant messagingfunctionality.

The request for report precursor generation regarding the topic typedinto text box 603, is supplied, typically via the Internet, to a reportprecursor-generating server 610. Server 610 supplies the desired reporttopic words to a previously-asked question and answer retrieval server612.

Previously-asked question and answer retrieval server 612 provides anoutput of previously-asked questions which contain the topic words andanswers thereto, as well as previously asked questions having previouslygenerated answers which contain the topic words and the generatedanswers, to question generating server 610.

Additionally or alternatively, server 610 may utilize the previouslyasked questions obtained from server 612 to search a corpus, such as theInternet, for answers to the question. Preferably, server 610 searchesthe corpus for answers by using the functionality described hereinabovewith reference to FIGS. 1-3. The questions and answers generated in thismanner are typically added to the retrieved questions and answers forgenerating an editable report precursor.

As a further alternative, server 610 may string the questions andanswers retrieved from server 612 to form a document, which is thensupplied to the question generation functionality of FIGS. 4 and 5.Server 610 may then utilize the functionality described hereinabove withreference to FIGS. 1-3 to find answers to questions generated by themethodology of FIGS. 4 and 5. The questions and answers generated inthis manner are typically added to the retrieved questions and answersfor generating an editable report precursor.

The retrieved questions and answers may be combined and presented to theuser in any suitable format, such as in a single editable reportprecursor format.

Preferably, the user then edits the report precursor to form a report,by adding questions, answers to questions, or additional informationinto the report precursor.

In accordance with a preferred embodiment of the present invention, theeditable report precursor and/or the final report are archived, and thecontents thereof is used in generating and/or retrieving questions andanswers for enhancing the processing of additional report precursors andthe overall functionality of the previous question/answer retrievingfunctionality.

It will be appreciated by persons skilled in the art that the presentinvention is not limited by what has been particularly shown anddescribed hereinabove. Rather the scope of the present inventionincludes combinations and subcombinations of various features of thepresent invention as well as modifications which would occur to personsreading the foregoing description and which are not in the prior art.

1. A document searching method comprising: employing a computer toreceive, from a user, a query including at least one search term;employing computerized answer retrieving functionality to generatedocument search terms including at least one additional search term notpresent in said query, which said at least one additional search termwas acquired, prior to receipt by said computer of said query from saiduser, by said computerized answer retrieving functionality in responseto at least one query in the form of a question; and operatingcomputerized search engine functionality to access a set of documents inresponse to said query, based not only on at least one search termsupplied by said user in said query, but also on said at least oneadditional search term provided by said computerized answer retrievingfunctionality.
 2. A document searching method according to claim 1 andwherein said query is a question.
 3. A document searching methodaccording to claim 1 and wherein said query is not a question.
 4. Adocument searching method according to claim 1 and wherein saidemploying computerized answer retrieving functionality provides said atleast one additional search term by retrieving search terms, acquiredother than in response to earlier questions, received by saidcomputerized answer retrieving functionality prior to receipt of saidquery from said user.
 5. A document searching method according to claim1 and wherein said employing a computer comprises employing saidcomputer to receive said query by at least one of: typing said query;using a voice responsive input device; using a screen scrapingfunctionality; using an email functionality; using an SMS functionality;and using an instant messaging functionality.
 6. A document searchingmethod according to claim 1 and wherein said employing computerizedanswer retrieving functionality to generate document search termscomprises utilizing computerized query normalizing functionality fornormalizing said query.
 7. A document searching method according toclaim 6 and wherein said normalizing said query is performed based atleast in part on at least one of a plurality of query normalizationrules.
 8. A document searching method according to claim 1 and whereinsaid employing computerized answer retrieving functionality to generatedocument search terms comprises generating document search terms,including said at least one additional search term not present in saidquery by replacing at least one word in said query by at least oneselected synonym thereof.
 9. A document searching method according toclaim 8 and wherein said replacing at least one word in said query by atleast one selected synonym thereof comprises employing computerizedsynonym retrieving functionality to identify said at least one selectedsynonym at least partially by reference to at least one word in saidquery other than said at least one word which is replaced by said atleast one selected synonym.
 10. A document searching method according toclaim 9 and wherein said employing computerized synonym retrievingfunctionality comprises identifying said at least one selected synonymby: identifying a plurality of synonyms; and selecting at least one ofsaid plurality of synonyms for which there exists a phrase in a corpuswhich is relevant to said query.
 11. A document searching methodaccording to claim 10 and wherein said identifying said at least oneselected synonym comprises: searching said corpus for occurrences of atleast one of said plurality of synonyms for which there exists a phrasein said corpus which is relevant to said query; and designating at leastone of said plurality of synonyms as a selected synonym in accordancewith a number of occurrences in said corpus of a phrase including saidat least one of said plurality of synonyms which is relevant to saidquery.
 12. A document searching method according to claim 1 and alsocomprising utilizing computerized query processing functionality toprocess said query prior to said operating computerized search enginefunctionality, said utilizing computerized query processingfunctionality including: utilizing said computerized query processingfunctionality to generate at least one expected answer to said query;utilizing said computerized query processing functionality to generateat least one preliminary search engine query based on said at least oneexpected answer; utilizing said computerized query processingfunctionality to concatenate said at least one preliminary search enginequery with said at least one additional search term not present in saidquery, thereby to form a concatenated search engine query; and providingsaid concatenated search engine query to said computerized search enginefunctionality.
 13. A document searching method according to claim 1 andalso comprising providing a representation of at least one document insaid set of documents to said user.
 14. A document searching methodaccording to claim 13 and wherein said providing a representationcomprises presenting at least one link to said at least one document.15. A document searching method according to claim 1 and alsocomprising: extracting at least one answer to said query from at leastone document in said set of documents; and providing said at least oneanswer to said user.
 16. A document searching method according to claim15 and wherein said extracting at least one answer comprises analyzingsaid at least one document by: carrying out theme extraction on said atleast one document, said theme extraction utilizing statistical analysisof frequency of occurrence of words to identify at least one theme wordof said at least one document; extracting sentences from said at leastone document; selecting at least one of said sentences as a potentialanswer; scoring each of said at least one of said sentences selected asa potential answer; and identifying said at least one of said sentencesselected as a potential answer based at least partially on results ofsaid scoring.
 17. A document searching method according to claim 15 andwherein said extracting at least one answer comprises: enhancing said atleast one document by: identifying capitalized phrases which appear insaid at least one document; identifying designated capitalized wordsbelonging to said capitalized phrases; and adding, to said at least onedocument, adjacent each occurrence of a designated capitalized word thatdoes not appear in a capitalized phrase, said designated capitalizedword that does appear alongside thereof elsewhere in said document in acapitalized phrase; and carrying out analysis of said at least onedocument in order to identify at least one portion thereof as apotential answer.
 18. A document searching method according to claim 15and wherein said providing said at least one answer to said usercomprises presenting said at least one answer in an editable reportprecursor format.
 19. A document searching method according to claim 1and wherein said employing computerized answer retrieving functionalitycomprises employing artificial intelligence.
 20. A system for documentsearching comprising: a computer operative to receive, from a user, aquery including at least one search term; computerized answer retrievingfunctionality operative to generate document search terms including atleast one additional search term not present in said query, which saidat least one additional search term was acquired, prior to receipt bysaid computer of said query from said user, by said computerized answerretrieving functionality in response to at least one query in the formof a question; and computerized search engine functionality operative toaccess a set of documents in response to said query, based not only onsaid at least one search term but also on said at least one additionalsearch term provided by said computerized answer retrievingfunctionality.
 21. A system for document searching according to claim 20and wherein said query is a question.
 22. A system for documentsearching according to claim 20 and wherein said query is not aquestion.
 23. A system for document searching according to claim 20 andwherein said computerized answer retrieving functionality is operativeto provide said at least one additional search term, by retrievingsearch terms acquired other than in response to earlier questions,received by said computerized answer retrieving functionality prior toreceipt of said query from said user.
 24. A system for documentsearching according to claim 20 and wherein said computer is operativeto receive said query from at least one of: a keyboard; a voiceresponsive input device; a screen scraping functionality; an emailfunctionality; an SMS functionality; and an instant messagingfunctionality.
 25. A system for document searching according to claim 21and wherein said computerized answer retrieving functionality includescomputerized query normalizing functionality for normalizing said query.26. A system for document searching according to claim 25 and whereinsaid computerized query normalizing functionality is operative tonormalize said query based at least in part on at least one of aplurality of query normalization rules.
 27. A system for documentsearching according to claim 20 and wherein said computerized answerretrieving functionality is operative to generate said at least oneadditional search term not present in said query by replacing at leastone word in said query by at least one selected synonym thereof.
 28. Asystem for document searching according to claim 27 and wherein saidcomputerized answer retrieving functionality includes computerizedsynonym retrieving functionality operative to identify said at least oneselected synonym at least partially by reference to at least one word insaid query other than said at least one word which is replaced by saidat least one selected synonym.
 29. A system for document searchingaccording to claim 28 and wherein said computerized synonym retrievingfunctionality includes a corpus and said computerized synonym retrievingfunctionality is operative to search said corpus for occurrences of atleast one of a plurality of synonyms for which there exists a phraserelevant to said query and to designate at least one of said pluralityof synonyms as a selected synonym in accordance with a number ofoccurrences in said corpus of a phrase including said at least onesynonym which is relevant to said query.
 30. A system for documentsearching according to claim 20 and also comprising a document outputdevice for providing a representation of at least one document in saidset of documents to said user.
 31. A system for document searchingaccording to claim 30 and wherein said document output device comprisesa display for presenting at least one link to said at least onedocument.
 32. A system for document searching according to claim 20 andalso comprising: computerized answer extraction functionality forextracting at least one answer from at least one document in said set ofdocuments; and an answer output device for providing said at least oneanswer to said user.
 33. A system for document searching according toclaim 32 and wherein said computerized answer extraction functionalityincludes a document analyzer operative to analyze said at least onedocument, said document analyzer including: computerized themeextraction functionality for carrying out theme extraction on said atleast one document, said theme extraction utilizing statistical analysisof frequency of occurrence of words to identify at least one theme wordof said at least one document; computerized sentence extractingfunctionality for extracting sentences from said at least one document;a potential answer selector for selecting at least one of said sentencesas a potential answer; computerized scoring functionality for scoringeach of said at least one of said sentences; and a sentence identifierfor identifying at least one of said sentences selected as a potentialanswer based at least partially on results of said scoring.
 34. A systemfor document searching according to claim 32 and wherein said answeroutput device comprises a display for presenting said at least oneanswer to said user in an editable report precursor format.
 35. A systemfor document searching according to claim 20 and wherein saidcomputerized answer retrieving functionality includes artificialintelligence.
 36. An answer extraction method comprising: employing acomputer to receive a question from a user; employing a computer networkto access a set of documents relevant to said question by employingdocument search terms derived by said computer from said question, saiddocument search terms including at least one additional search term notpresent in the question, which said at least one additional search termwas acquired prior to receipt of said question from said user; analyzingsaid set of documents to extract at least one answer to said question;and providing said at least one answer to said user.
 37. An answerextraction method according to claim 36 and wherein said employing acomputer network includes providing said at least one additional searchterm, by retrieving search terms acquired in response to earlierquestions, received prior to receipt of said question from said user.38. An answer extraction method according to claim 36 and wherein saidemploying a computer network includes providing said at least oneadditional search term by retrieving search terms, acquired other thanin response to earlier questions, received prior to receipt of saidquestion from said user.
 39. An answer extraction method according toclaim 36 and wherein said employing a computer network employsartificial intelligence.
 40. An answer extraction method according toclaim 36 and wherein said employing a computer to receive a questioncomprises employing said computer to receive said question by at leastone of: typing said question; using a voice responsive input device;using a screen scraping functionality; using an email functionality;using an SMS functionality; and using an instant messagingfunctionality.
 41. An answer extraction method according to claim 36 andwherein said employing document search terms comprises utilizingcomputerized question normalizing functionality for normalizing saidquestion.
 42. An answer extraction method according to claim 41 andwherein said normalizing said question is performed based at least inpart on at least one of a plurality of question normalization rules. 43.An answer extraction method according to claim 36 and wherein saidemploying document search terms comprises generating document searchterms including said at least one additional search term not present insaid question by replacing at least one word in said question by atleast one selected synonym thereof.
 44. An answer extraction methodaccording to claim 43 and wherein said replacing at least one word insaid question by at least one selected synonym thereof comprisesemploying computerized synonym retrieving functionality to identify saidat least one selected synonym at least partially by reference to atleast one word in said question other than said at least one word whichis replaced by said at least one selected synonym.
 45. An answerextraction method according to claim 44 and wherein said employingcomputerized synonym retrieving functionality comprises identifying saidat least one selected synonym by: identifying a plurality of synonyms;and selecting at least one of said plurality of synonyms for which thereexists a phrase relevant to said question in a corpus.
 46. An answerextraction method according to claim 45 and wherein said identifyingsaid at least one selected synonym comprises: searching said corpus foroccurrences of at least one of said plurality of synonyms for whichthere exists a phrase relevant to said question; and designating atleast one synonym of said plurality of synonyms as a selected synonym inaccordance with a number of occurrences in said corpus of a phraseincluding said at least one synonym which is relevant to said question.47. An answer extraction method according to claim 36 and alsocomprising utilizing computerized question processing functionality toprocess said question, said utilizing computerized question processingfunctionality including: utilizing said computerized question processingfunctionality to generate at least one expected answer to said question;utilizing said computerized question processing functionality togenerate at least one preliminary search engine query based on said atleast one expected answer; utilizing said computerized questionprocessing functionality to concatenate said at least one preliminarysearch engine query with said at least one additional search term notpresent in said question, thereby to form a concatenated search enginequery; and deriving said document search terms from said concatenatedsearch engine query.
 48. An answer extraction method according to claim36 and wherein said providing said at least one answer to said user alsocomprises providing a representation of at least one document of saidset of documents to said user.
 49. An answer extraction method accordingto claim 48 and wherein said providing a representation comprisespresenting at least one link to said at least one document.
 50. Ananswer extraction method according to claim 36 and wherein saidanalyzing said set of documents to extract at least one answer to saidquestion comprises: carrying out theme extraction on plural ones of saidset of documents, said theme extraction utilizing statistical analysisof frequency of occurrence of words to identify at least one theme wordof said at least one document; extracting sentences from said at leastone document; selecting at least one of said sentences as a potentialanswer; scoring each of said at least one of said sentences; andidentifying at least one of said sentences selected as a potentialanswer based at least partially on results of said scoring.
 51. Ananswer extraction method according to claim 36 and wherein saidanalyzing said set of documents to extract said at least one answercomprises: enhancing said at least one document of said set of documentsby: identifying capitalized phrases which appear in said at least onedocument; identifying designated capitalized words belonging to saidcapitalized phrases; and adding, to said at least one document adjacenteach occurrence of a designated capitalized word that does not appear ina capitalized phrase, said designated capitalized word that does appearalongside thereof elsewhere in said at least one document in acapitalized phrase; and carrying out analysis of said at least onedocument in order to identify at least one portion thereof as apotential answer.
 52. An answer extraction method according to claim 36and wherein said providing said at least one answer to said usercomprises presenting said at least one answer in an editable reportprecursor format.
 53. An answer extraction method according to claim 36and wherein said question is not phrased in question format.
 54. Ananswer extraction system comprising: a computer operative to receive aquestion from a user; computerized answer extraction functionalityoperative to employ a computer network to access a set of documentsrelevant to said question by employing document search terms derived bysaid computer from said question, said document search terms includingat least one additional search term not present in the question, whichsaid at least one additional search term was acquired prior to receiptof said question from said user; computerized answer analysisfunctionality for analyzing said set of documents to extract at leastone answer to said question; and an output device operative to providesaid at least one answer to said user.
 55. An answer extraction systemaccording to claim 54 and wherein said computer network provides said atleast one additional search term by retrieving search terms, acquired inresponse to earlier questions, received prior to receipt of saidquestion from said user.
 56. An answer extraction system according toclaim 54 and wherein said computer network provides said at least oneadditional search term by retrieving search terms, acquired other thanin response to earlier questions, received prior to receipt of saidquestion from said user.
 57. An answer extraction system according toclaim 54 and wherein said computer network employs artificialintelligence.
 58. An answer extraction system according to claim 54 andwherein said computer is operative to receive said question from atleast one of: a keyboard; a voice responsive input device; a screenscraping functionality; an email functionality; an SMS functionality;and an instant messaging functionality.
 59. An answer extraction systemaccording to claim 54 and wherein said computerized answer extractionfunctionality includes computerized question normalizing functionalityfor normalizing said question.
 60. An answer extraction system accordingto claim 59 and wherein said computerized question normalizingfunctionality is operative to normalize said question based at least inpart on at least one of a plurality of question normalization rules. 61.An answer extraction system according to claim 54 and wherein saidcomputerized answer extraction functionality is operative to generatesaid at least one additional search term not present in said question byreplacing at least one word in said question by at least one selectedsynonym thereof.
 62. An answer extraction system according to claim 61and wherein said computerized answer extraction functionality includescomputerized synonym retrieving functionality operative to identify saidat least one selected synonym at least partially by reference to atleast one word in said question other than said at least one word whichis replaced by said at least one selected synonym.
 63. An answerextraction system according to claim 62 and wherein said computerizedsynonym retrieving functionality includes a corpus and said computerizedsynonym retrieving functionality is operative to search said corpus foroccurrences of each one of a plurality of synonyms for which thereexists a phrase including said one of said plurality of synonymsrelevant to said question, and to designate at least one of saidplurality of synonyms as a selected synonym in accordance with a numberof occurrences in said corpus of a phrase including said at least one ofsaid plurality of synonyms relevant to said question.
 64. An answerextraction system according to claim 54 and wherein said output deviceis operative to provide a representation of at least one document ofsaid set of documents to said user.
 65. An answer extraction systemaccording to claim 64 and wherein said output device comprises a displayfor presenting at least one link to said at least one document to saiduser.
 66. An answer extraction system according to claim 54 and whereinsaid computerized answer extraction functionality includes: computerizedtheme extraction functionality for carrying out theme extraction onplural ones of said set of documents, said theme extraction utilizingstatistical analysis of frequency of occurrence of words to identify atleast one theme word of said at least one document; computerizedsentence extracting functionality for extracting sentences from said atleast one document; a potential answer selector for selecting at leastone of said sentences as a potential answer; scoring functionality forscoring each said at least one of said sentences; and a sentenceidentifier for identifying at least one of said sentences selected as apotential answer based at least partially on results of said scoring.67. An answer extraction system according to claim 54 and wherein saidoutput device comprises a display for presenting said at least oneanswer in an editable report precursor format.
 68. An answer extractionsystem according to claim 54 and wherein said question is not phrased inquestion format.
 69. An answer extraction method comprising: employing acomputer to receive a question from a user; employing a computer networkto access a set of documents relevant to said question by employingdocument search terms derived by said computer from said question;extracting at least one answer to said question; and providing said atleast one answer to said user, said extracting at least one answercomprising: generating an expected answer to said question, saidexpected answer including question keywords; analyzing said set ofdocuments by: carrying out theme extraction on plural ones of said setof documents, said theme extraction utilizing statistical analysis ofthe frequency of occurrence of words to identify at least one theme wordof a document, which theme word may or may not be a question keyword;and extracting sentences from plural ones of said set of documents;selecting at least one of said sentences as a potential answer if itfulfills at least one of the following criteria: a sentence including atleast a predetermined plurality of question keywords; and a sentenceincluding at least one question keyword and at least one theme word;scoring each of said at least one of said sentences selected as apotential answer; and identifying at least one of said at least one ofsaid sentences selected as a potential answer based at least partiallyon results of said scoring.
 70. An answer extraction method according toclaim 69 and wherein said employing a computer to receive a questioncomprises employing said computer to receive said question by at leastone of: typing said question; using a voice responsive input device;using a screen scraping functionality; using an email functionality;using an SMS functionality; and using an instant messagingfunctionality.
 71. An answer extraction method according to claim 69 andalso comprising, prior to said employing a computer network to access aset of documents: utilizing computerized question normalizationfunctionality for normalizing said question; and thereafter, utilizingcomputerized question classification functionality to classify saidquestion.
 72. An answer extraction method according to claim 71 andwherein said normalizing said question is performed based at least inpart on at least one of a plurality of question normalization rules. 73.An answer extraction method according to claim 69 and wherein saidemploying a computer network comprises employing said computer to derivesaid document search terms including at least one additional search termnot present in the question, which said at least one additional searchterm was acquired prior to receipt of said question from said user. 74.An answer extraction method according to claim 69 and wherein saidemploying a computer network comprises employing said computer to derivesaid document search terms including at least one additional search termnot present in the question by replacing at least one word in saidquestion by at least one selected synonym thereof.
 75. An answerextraction method according to claim 74 and wherein said replacing atleast one word in said question by at least one selected synonym thereofcomprises employing computerized synonym retrieving functionality toidentify said at least one selected synonym at least partially byreference to at least one word in said question other than said at leastone word which is replaced by said at least one selected synonym.
 76. Ananswer extraction method according to claim 75 and wherein saidemploying computerized synonym retrieving functionality comprisesidentifying said at least one selected synonym by: identifying aplurality of synonyms; and selecting at least one of said plurality ofsynonyms for which there exists a phrase relevant to said question in acorpus.
 77. An answer extraction method according to claim 76 andwherein said identifying said at least one selected synonym comprises:searching said corpus for occurrences of at least one of said pluralityof synonyms for which there exists a phrase relevant to said question;and designating at least one of said plurality of synonyms as a selectedsynonym in accordance with a number of occurrences in said corpus of aphrase including said at least one of said plurality of synonyms whichis relevant to said question.
 78. An answer extraction method accordingto claim 69 and also comprising providing a representation of said atleast one document of said set of documents to said user.
 79. An answerextraction method according to claim 78 and wherein said providing arepresentation comprises presenting at least one link to said at leastone document.
 80. An answer extraction method according to claim 69 andwherein said providing said at least one answer to said user comprisespresenting said at least one answer in an editable report precursorformat.
 81. An answer extraction method according to claim 69 andwherein said statistical analysis comprises: for each word in saiddocument, stemming said word to a corresponding root word; generating aword occurrence frequency score for each different root wordcorresponding to a word in said document; using said word occurrencefrequency scores to calculate a document word occurrence frequencyindicating score for said document; selecting a subset of words in saiddocument including at least one word having a word occurrence frequencyscore which is greater than or equal to said document word occurrencefrequency indicating score.
 82. An answer extraction method according toclaim 81 and wherein said document word occurrence frequency indicatingscore comprises at least one of an average of said word occurrencefrequency scores and a median of said word occurrence frequency scores.83. An answer extraction method according to claim 81 and wherein saidstatistical analysis comprises selecting, as said at least one themeword, at least one word having a word occurrence frequency score whichis greater than or equal to twice said document word occurrencefrequency indicating score.
 84. An answer extraction method according toclaim 81 and wherein said statistical analysis also comprises: followingsaid selecting a subset of words in said document, calculating a subsetword occurrence frequency indicating score; and selecting, as said atleast one theme word, at least one of said subset of words having a wordoccurrence frequency score which is greater than or equal to said subsetword occurrence frequency indicating score.
 85. An answer extractionmethod according to claim 84 and wherein said subset word occurrencefrequency indicating score comprises at least one of an average of saidword occurrence frequency scores of words in said subset of words and amedian of said word occurrence frequency scores of words in said subsetof words.
 86. An answer extraction method according to claim 69 andwherein said question is not phrased in question format.
 87. An answerextraction system comprising: a computer operative to receive a questionfrom a user; and computerized answer extraction functionality operativeto employ a computer network to access a set of documents relevant tosaid question by employing document search terms derived by saidcomputer from said question, to extract at least one answer to saidquestion and to provide said at least one answer to said user, saidcomputerized answer extraction functionality comprising: an expectedanswer generator operative to generate an expected answer to saidquestion, said expected answer including question keywords; a documentanalyzer operative to carry out theme extraction on plural ones of saidset of documents, said theme extraction utilizing statistical analysisof the frequency of occurrence of words in a document to identify atleast one theme word of said document, which theme word may or may notbe a question keyword; a sentence extractor, operative to extractsentences from plural ones of said set of documents; a potential answerselector, operative to select at least one of said sentences as apotential answer if it fulfills at least one of the following criteria:a sentence including at least a predetermined plurality of questionkeywords; and a sentence including at least one question keyword and atleast one theme word; and a potential answer identifier, operative tocalculate a score for each of said at least one of said sentencesselected as a potential answer and to identify at least one of saidsentences selected as a potential answer based at least partially onsaid score.
 88. An answer extraction system according to claim 87 andwherein said computer is operative to receive said question from atleast one of: a keyboard; a voice responsive input device; a screenscraping functionality; an email functionality; an SMS functionality;and an instant messaging functionality.
 89. An answer extraction systemaccording to claim 87 and also comprising: computerized questionnormalizing functionality operative to normalize said question; andcomputerized question classification functionality for classifying saidquestion.
 90. An answer extraction system according to claim 89 andwherein said computerized question normalizing functionality isoperative to normalize said question based at least in part on at leastone of a plurality of question normalization rules.
 91. An answerextraction system according to claim 87 and wherein said computerizedanswer extraction functionality is operative to employ said computer toderive said document search terms, including at least one additionalsearch term not present in the question, which said at least oneadditional search term was acquired prior to receipt of said questionfrom said user.
 92. An answer extraction system according to claim 87and wherein said computerized answer extraction functionality isoperative to employ said computer to derive said document search terms,including at least one additional search term not present in thequestion, by replacing at least one word in said question by at leastone selected synonym thereof.
 93. An answer extraction system accordingto claim 92 and wherein said computerized answer extractionfunctionality includes computerized synonym retrieving functionalityoperative to identify said at least one selected synonym at leastpartially by reference to at least one word in said question other thansaid at least one word which is replaced by said at least one selectedsynonym.
 94. An answer extraction system according to claim 93 andwherein said computerized synonym retrieving functionality includes acorpus and said computerized synonym retrieving functionality isoperative to search said corpus for occurrences of at least one of aplurality of synonyms for which there exists a phrase relevant to saidquestion and to designate at least one synonym as a selected synonym inaccordance with a number of occurrences in said corpus of a phrase,including said at least one synonym which is relevant to said question.95. An answer extraction system according to claim 87 and alsocomprising a document output device for providing a representation of atleast one document of said set of documents to said user.
 96. An answerextraction system according to claim 95 and wherein said document outputdevice comprises a display for presenting at least one link to said atleast one document.
 97. An answer extraction system according to claim87 and also comprising an answer output device for providing said atleast one answer to said user.
 98. An answer extraction system accordingto claim 97 and wherein said answer output device comprises a displayfor presenting said at least one answer in an editable report precursorformat.
 99. An answer extraction system according to claim 87 andwherein said document analyzer comprises: computerized word stemmingfunctionality, operative, for each word in said document, to stem saidword to a corresponding root word; a word occurrence frequency scoregenerator for generating a word occurrence frequency score for eachdifferent root word corresponding to a word in said document;computerized document word occurrence frequency indicating scorecalculating functionality operative to use said word occurrencefrequency scores to calculate a document word occurrence frequencyindicating score for said document; and computerized word selectingfunctionality operative to select a subset of words in said documentincluding at least one word having a word occurrence frequency scorewhich is greater than or equal to said document word occurrencefrequency indicating score.
 100. An answer extraction system accordingto claim 99 and wherein said computerized document word occurrencefrequency indicating score calculating functionality is operative tocalculate said document word occurrence frequency indicating score bycalculating at least one of an average of said word occurrence frequencyscores and a median of said word occurrence frequency scores.
 101. Ananswer extraction system according to claim 99 and wherein saidcomputerized word selecting functionality is operative to select, assaid at least one theme word, at least one word having a word occurrencefrequency score which is greater than or equal to twice said documentword occurrence frequency indicating score.
 102. An answer extractionsystem according to claim 99 and wherein said document analyzer alsocomprises: computerized subset word occurrence frequency indicatingscore calculating functionality, operative to calculate a subset wordoccurrence frequency indicating score; and computerized theme wordselection functionality operative to select, as said at least one themeword, at least one of said subset of words having a word occurrencefrequency score which is greater than or equal to said subset wordoccurrence frequency indicating score.
 103. An answer extraction systemaccording to claim 102 and wherein said computerized subset wordoccurrence frequency indicating score calculating functionality isoperative to calculate said subset word occurrence frequency indicatingscore by calculating at least one of an average of said word occurrencefrequency scores of words in said subset of words and a median of saidword occurrence frequency scores of words in said subset of words. 104.An answer extraction system according to claim 87 and wherein saidquestion is not phrased in question format.
 105. An answer extractionmethod comprising: employing a computer to receive a question from auser; employing a computer network to access a set of documents relevantto said question by employing document search terms derived by saidcomputer from said question; extracting at least one answer to saidquestion; and providing said at least one answer to said user, saidextracting at least one answer including: enhancing at least one of saidset of documents by: identifying capitalized phrases which appear insaid at least one document; identifying designated capitalized wordsbelonging to said capitalized phrases; and adding, to said at least onedocument adjacent each occurrence of a designated capitalized word thatdoes not appear in a capitalized phrase, the designated capitalized wordthat does appear alongside thereof elsewhere in the document in acapitalized phrase; and carrying out analysis of said at least onedocument in order to identify at least one portion thereof as apotential answer.
 106. An answer extraction method according to claim105 and wherein said employing a computer to receive a questioncomprises employing said computer to receive said question by at leastone of: typing said question; using a voice responsive input device;using a screen scraping functionality; using an email functionality;using an SMS functionality; and using an instant messagingfunctionality.
 107. An answer extraction method according to claim 105and also comprising, prior to said employing said computer network:utilizing computerized question normalization functionality fornormalizing said question; and thereafter, utilizing computerizedquestion classification functionality to classify said question.
 108. Ananswer extraction method according to claim 107 and wherein saidnormalizing said question is performed based at least in part on atleast one of a plurality of question normalization rules.
 109. An answerextraction method according to claim 105 and wherein said employing acomputer network comprises employing said computer to derive saiddocument search terms, including at least one additional search term notpresent in the question, which said at least one additional search termwas acquired prior to receipt of said question from said user.
 110. Ananswer extraction method according to claim 105 and wherein saidemploying a computer network comprises employing said computer to derivesaid document search terms, including at least one additional searchterm not present in the question by replacing at least one word in saidquestion by at least one selected synonym thereof.
 111. An answerextraction method according to claim 110 and wherein said replacing atleast one word in said question by at least one selected synonym thereofcomprises employing computerized synonym retrieving functionality toidentify said at least one selected synonym at least partially byreference to at least one word in said question other than said at leastone word which is replaced by said at least one selected synonym. 112.An answer extraction method according to claim 111 and wherein saidemploying computerized synonym retrieving functionality comprisesidentifying said at least one selected synonym by: identifying aplurality of synonyms; and selecting at least one of said plurality ofsynonyms for which there exists a phrase relevant to said question in acorpus.
 113. An answer extraction method according to claim 112 andwherein said identifying said at least one selected synonym comprises:searching said corpus for occurrences of at least one of said pluralityof synonyms for which there exists a phrase relevant to said question;and designating at least one synonym as a selected synonym in accordancewith a number of occurrences in said corpus of a phrase including saidat least one synonym which is relevant to said question.
 114. An answerextraction method according to claim 105 and wherein said extracting atleast one answer also comprises, prior to said enhancing, generating anexpected answer to said question, said expected answer includingquestion keywords, and wherein said carrying out analysis of said atleast one document comprises: carrying out theme extraction on said atleast one document, said theme extraction utilizing statistical analysisof the frequency of occurrence of words to identify at least one themeword of said at least one document, which theme word may or may not be aquestion keyword; extracting sentences from said at least one document;selecting at least one of said sentences as a potential answer if itfulfills at least one of the following criteria: a sentence including atleast a predetermined plurality of question keywords; and a sentenceincluding at least one question keyword and at least one theme word;scoring each of said at least one of said sentences selected as apotential answer; and identifying at least one of said sentencesselected as a potential answer based at least partially on results ofsaid scoring.
 115. An answer extraction method according to claim 114and wherein said statistical analysis comprises: for each word in saidat least one document, stemming said word to a corresponding root word;generating a word occurrence frequency score for each different rootword corresponding to a word in said at least one document; using saidword occurrence frequency scores to calculate a document word occurrencefrequency indicating score for said at least one document; and selectingas potential theme words a subset of words in said at least one documentincluding at least one word having a word occurrence frequency scorewhich is greater than or equal to said document word occurrencefrequency indicating score.
 116. An answer extraction method accordingto claim 115 and wherein said document word occurrence frequencyindicating score comprises at least one of an average of said wordoccurrence frequency scores and a median of said word occurrencefrequency scores.
 117. An answer extraction method according to claim115 and wherein said selecting as potential theme words comprisesselecting, as said at least one theme word, at least one word having aword occurrence frequency score which greater than or equal to twicesaid document word occurrence frequency indicating score.
 118. An answerextraction method according to claim 114 and wherein said statisticalanalysis also comprises: following said selecting as potential themewords a subset of words in said at least one document, calculating asubset word occurrence frequency indicating score; and selecting, assaid at least one theme word, at least one of said subset of wordshaving a word occurrence frequency score which is greater than or equalto said subset word occurrence frequency indicating score.
 119. Ananswer extraction method according to claim 118 and wherein said subsetword occurrence frequency indicating score comprises at least one of anaverage of said word occurrence frequency scores of words in said subsetof words and a median of said word occurrence frequency scores of wordsin said subset of words.
 120. An answer extraction method according toclaim 105 and also comprising providing a representation of at least oneof said set of documents to said user.
 121. An answer extraction methodaccording to claim 120 and wherein said providing a representationcomprises presenting at least one link to said at least one document.122. An answer extraction method according to claim 105 and wherein saidproviding said at least one answer to said user comprises presentingsaid at least one answer in an editable report precursor format.
 123. Ananswer extraction method according to claim 105 and wherein saidquestion is not phrased in question format.
 124. An answer extractionsystem comprising: a computer operative to receive a question from auser; computerized answer extraction functionality operative to employ acomputer network to access a set of documents relevant to said questionby employing document search terms derived by said computer from saidquestion, to extract at least one answer to said question and to providesaid at least one answer to said user, said computerized answerextraction functionality comprising a document analyzer operative toidentify capitalized phrases which appear in a document belonging tosaid set of documents, to identify designated capitalized wordsbelonging to said capitalized phrases, to add to said document adjacenteach occurrence of a designated capitalized word that does not appear ina capitalized phrase, the designated capitalized word that does appearalongside thereof elsewhere in said document in a capitalized phrase,thereby providing an enhanced document, and to carry out analysis ofsaid enhanced document in order to identify at least one portion thereofas a potential answer.
 125. An answer extraction system according toclaim 124 and wherein said computer is operative to receive saidquestion from at least one of: a keyboard; a voice responsive inputdevice; a screen scraping functionality; an email functionality; an SMSfunctionality; and an instant messaging functionality.
 126. An answerextraction system according to claim 124 and also comprising:computerized question normalizing functionality operative fornormalizing said question; and computerized question classificationfunctionality for classifying said question.
 127. An answer extractionsystem according to claim 126 and wherein said computerized questionnormalizing functionality is operative to normalize said question basedat least in part on at least one of a plurality of questionnormalization rules.
 128. An answer extraction system according to claim124 and wherein said computerized answer extraction functionality isoperative to employ said computer to derive said document search terms,including at least one additional search term not present in thequestion, which said at least one additional search term was acquiredprior to receipt of said question from said user.
 129. An answerextraction system according to claim 124 and wherein said computerizedanswer extraction functionality is operative to employ said computer toderive said document search terms, including at least one additionalsearch term not present in the question by replacing at least one wordin said question by at least one selected synonym thereof.
 130. Ananswer extraction system according to claim 129 and wherein saidcomputerized answer extraction functionality includes computerizedsynonym retrieving functionality operative to identify said at least oneselected synonym at least partially by reference to at least one word insaid question other than said at least one word which is replaced bysaid at least one selected synonym.
 131. An answer extraction systemaccording to claim 130 and wherein said computerized synonym retrievingfunctionality includes a corpus and said computerized synonym retrievingfunctionality is operative to search said corpus for occurrences of atleast one of a plurality of synonyms for which there exists a phraserelevant to said question and to designate at least one of saidplurality of synonyms as a selected synonym in accordance with a numberof occurrences in said corpus of a phrase including said at least one ofsaid plurality of synonyms which is relevant to said question.
 132. Ananswer extraction system according to claim 124 and wherein saidcomputerized answer extraction functionality also comprises an expectedanswer generator operative to generate an expected answer to saidquestion, said expected answer including question keywords, and whereinsaid document analyzer comprises: computerized theme extractionfunctionality for carrying out theme extraction on said document, saidtheme extraction utilizing statistical analysis of the frequency ofoccurrence of words to identify at least one theme word of saiddocument, which theme word may or may not be a question keyword; asentence extractor, operative to extract sentences from said document; apotential answer selector, operative to select at least one of saidsentences as a potential answer if it fulfills at least one of thefollowing criteria: a sentence including at least a predeterminedplurality of question keywords; and a sentence including at least onequestion keyword and at least one theme word; and a potential answeridentifier, operative to calculate a score for each of said at least oneof said sentences and to identify at least one of said sentencesselected as a potential answer based at least partially on results ofsaid score.
 133. An answer extraction system according to claim 132 andwherein said document analyzer comprises: computerized word stemmingfunctionality, operative, for each word in said document, to stem saidword to a corresponding root word; a word occurrence frequency scoregenerator for generating a word occurrence frequency score for eachdifferent root word corresponding to a word in said document;computerized document word occurrence frequency indicating scorecalculating functionality operative to use said word occurrencefrequency scores to calculate a document word occurrence frequencyindicating score for said document; and computerized word selectingfunctionality operative to select a subset of words in said documentincluding at least one word having a word occurrence frequency scorewhich is greater than or equal to said document word occurrencefrequency indicating score.
 134. An answer extraction system accordingto claim 133 and wherein said computerized document word occurrencefrequency indicating score calculating functionality is operative tocalculate said document word occurrence frequency indicating score bycalculating at least one of an average of said word occurrence frequencyscores and a median of said word occurrence frequency scores.
 135. Ananswer extraction system according to claim 132 and wherein saidcomputerized theme extraction functionality is operative to select, assaid at least one theme word, at least one word having a word occurrencefrequency score which is greater than or equal to twice said documentword occurrence frequency indicating score.
 136. An answer extractionsystem according to claim 132 and wherein said document analyzer alsocomprises: computerized subset word occurrence frequency indicatingscore calculating functionality, operative to calculate a subset wordoccurrence frequency indicating score; and computerized theme wordselection functionality operative to select, as said at least one themeword, at least one of said subset of words having a word occurrencefrequency score which is greater than or equal to said subset wordoccurrence frequency indicating score.
 137. An answer extraction systemaccording to claim 136 and wherein said computerized subset wordoccurrence frequency indicating score calculating functionality isoperative to calculate said subset word occurrence frequency indicatingscore by calculating at least one of an average of said word occurrencefrequency scores of words in said subset of words and a median of saidword occurrence frequency scores of words in said subset of words. 138.An answer extraction system according to claim 124 and also comprising adocument output device for providing a representation of at least one ofsaid set of documents to said user.
 139. An answer extraction systemaccording to claim 138 and wherein said document output device comprisesa display for presenting at least one link to said at least onedocument.
 140. An answer extraction system according to claim 124 andalso comprising an answer output device for providing said at least oneanswer to said user.
 141. An answer extraction system according to claim140 and wherein said answer output device comprises a display forpresenting said at least one answer in an editable report precursorformat.
 142. An answer extraction system according to claim 124 andwherein said question is not phrased in question format.
 143. An answerextraction method comprising: employing a computer to receive a questionfrom a user; employing a computer network to access a set of documentsrelevant to said question by employing document search terms derived bysaid computer from said question; extracting at least one answer to saidquestion; and providing said at least one answer to said user, saidextracting at least one answer to said question comprising: identifyinga multiplicity of potential answers; and evaluating each of saidmultiplicity of potential answers according to at least one of thefollowing criteria: proximity of question keywords in the potentialanswer; proximity of classification words and nouns in the potentialanswer; and word count of at least part of the potential answer.
 144. Ananswer extraction method according to claim 143 and wherein saidevaluating comprises evaluating each of said multiplicity of potentialanswers according to at least two of the following criteria: proximityof question keywords in the potential answer; proximity ofclassification words and nouns in the potential answer; and word countof at least part of the potential answer.
 145. An answer extractionmethod according to claim 143 and wherein said evaluating comprisesevaluating each of said multiplicity of potential answers according toall of the following criteria: proximity of question keywords in thepotential answer; proximity of classification words and nouns in thepotential answer; and word count of at least part of the potentialanswer.
 146. An answer extraction method according to claim 143 andwherein said evaluating comprises evaluating each of said multiplicityof potential answers according to a combination of the followingcriteria: proximity of question keywords in the potential answer;proximity of classification words and nouns in the potential answer; andword count of at least part of the potential answer.
 147. An answerextraction method according to claim 143 and wherein said extracting atleast one answer also comprises selecting a sub group of saidmultiplicity of potential answers based on an evaluation of saidmultiplicity of potential answers in accordance with said criteria. 148.An answer extraction method according to claim 147 and wherein saidevaluation comprises scoring said multiplicity of potential answers inaccordance with said criteria.
 149. An answer extraction methodaccording to claim 148 and also comprising: forming a potential answerdocument by combining said multiplicity of potential answers; extractinga theme of said sub group of said multiplicity of potential answers, byutilizing statistical analysis of the frequency of occurrence of wordsin said potential answer document to identify at least one theme word insaid sub group of said multiplicity of potential answers, which themeword may or may not be a question keyword; and discarding potentialanswers belonging to said sub group of said multiplicity of potentialanswers which do not include at least one of said at least one themeword.
 150. An answer extraction method according to claim 149 andwherein said statistical analysis comprises: for each word in saidpotential answer document, stemming said word to a corresponding rootword; generating a word occurrence frequency score for each differentroot word corresponding to a word in said potential answer document;using said word occurrence frequency scores to calculate a document wordoccurrence frequency indicating score for said potential answerdocument; and selecting a subset of words in said potential answerdocument including at least one word having a word occurrence frequencyscore which is greater than or equal to said document word occurrencefrequency indicating score.
 151. An answer extraction method accordingto claim 150 and wherein said document word occurrence frequencyindicating score comprises at least one of an average of said wordoccurrence frequency scores and a median of said word occurrencefrequency scores.
 152. An answer extraction method according to claim149 and wherein said extracting a theme comprises selecting, as said atleast one theme word, at least one word having a word occurrencefrequency score which is greater than or equal to twice said documentword occurrence frequency indicating score.
 153. An answer extractionmethod according to claim 149 and wherein said statistical analysis alsocomprises, following said selecting a subset of words in said potentialanswer document: calculating a subset word occurrence frequencyindicating score; and selecting as said at least one theme word, atleast one of said subset of words having a word occurrence frequencyscore which is greater than or equal to said subset word occurrencefrequency indicating score.
 154. An answer extraction method accordingto claim 153 and wherein said subset word occurrence frequencyindicating score comprises at least one of an average of said wordoccurrence frequency scores of words in said subset of words and amedian of said word occurrence frequency scores of words in said subsetof words.
 155. An answer extraction method according to claim 143 andwherein said providing said at least one answer to said user comprisesproviding said at least one answer to said user in an order governed atleast in part by at least one of: a word count of each of said at leastone answer; a score resulting from application to each of said at leastone answer of at least one of the following criteria: proximity ofquestion keywords in said at least one answer; proximity ofclassification words and nouns in said at least one answer; and wordcount of at least part of said at least one answer.
 156. An answerextraction method according to claim 143 and wherein said employing acomputer to receive said question comprises employing said computer toreceive said question by at least one of: typing said question; using avoice responsive input device; using a screen scraping functionality;using an email functionality; using an SMS functionality; and using aninstant messaging functionality.
 157. An answer extraction methodaccording to claim 143 and also comprising, prior to said employing acomputer network: utilizing computerized question normalizationfunctionality for normalizing said question; and thereafter, utilizingcomputerized question classification functionality to classify saidquestion.
 158. An answer extraction method according to claim 157 andwherein said normalizing said question is performed based at least inpart on at least one of a plurality of question normalization rules.159. An answer extraction method according to claim 143 and wherein saidemploying a computer network comprises employing said computer to derivesaid document search terms, including at least one additional searchterm not present in the question, which said at least one additionalsearch term was acquired prior to receipt of said question from saiduser.
 160. An answer extraction method according to claim 143 andwherein said employing a computer network comprises employing saidcomputer to derive said document search terms, including at least oneadditional search term not present in the question by replacing at leastone word in said question by at least one selected synonym thereof. 161.An answer extraction method according to claim 160 and wherein saidreplacing at least one word in said question by at least one selectedsynonym thereof comprises employing computerized synonym retrievingfunctionality to identify said at least one selected synonym at leastpartially by reference to at least one word in said question other thansaid at least one word which is replaced by said at least one selectedsynonym.
 162. An answer extraction method according to claim 161 andwherein said employing computerized synonym retrieving functionalitycomprises identifying said at least one selected synonym by: identifyinga plurality of synonyms; and selecting at least one of said plurality ofsynonyms for which there exists a phrase relevant to said question in acorpus.
 163. An answer extraction method according to claim 162 andwherein said identifying said at least one selected synonym comprises:searching said corpus for occurrences of at least one of said pluralityof synonyms for which there exists a phrase relevant to said question;and designating at least one of said plurality of synonyms as a selectedsynonym in accordance with a number of occurrences in said corpus of aphrase including said at least one of said plurality of synonyms whichis relevant to said question.
 164. An answer extraction method accordingto claim 143 and wherein said identifying a multiplicity of potentialanswers also comprises: enhancing at least one of said set of documentsby: identifying capitalized phrases which appear in said at least one ofsaid set of documents; identifying designated capitalized wordsbelonging to said capitalized phrases; and adding, to said at least oneof said set of documents adjacent each occurrence of a designatedcapitalized word that does not appear in a capitalized phrase, thedesignated capitalized word that does appear alongside thereof elsewherein the document in a capitalized phrase; and carrying out analysis ofsaid at least one of said set of documents in order to identify at leastone portion thereof as a potential answer.
 165. An answer extractionmethod according to claim 164 and wherein said identifying amultiplicity of potential answers also comprises, prior to saidenhancing, generating an expected answer to said question, said expectedanswer including question keywords, and wherein said carrying outanalysis comprises: carrying out theme extraction on said at least oneof said set of documents, said theme extraction utilizing statisticalanalysis of the frequency of occurrence of words to identify at leastone theme word of said at least one of said set of documents, whichtheme word may or may not be a question keyword; extracting sentencesfrom said at least one of said set of documents; selecting at least oneof said sentences as a potential answer if it fulfills at least one ofthe following criteria: a sentence including at least a predeterminedplurality of question keywords; and a sentence including at least onequestion keyword and at least one theme word; scoring each of said atleast one of said sentences selected as a potential answer; andidentifying at least one of said sentences selected as a potentialanswer based at least partially on results of said scoring.
 166. Ananswer extraction method according to claim 143 and also comprisingproviding a representation of at least one of said set of documents tosaid user.
 167. An answer extraction method according to claim 166 andwherein said providing a representation comprises presenting at leastone link to said at least one of said set of documents.
 168. An answerextraction method according to claim 143 and wherein said providing saidat least one answer to said user comprises presenting said at least oneanswer in an editable report precursor format.
 169. An answer extractionmethod according to claim 143 and wherein said question is not phrasedin question format.
 170. An answer extraction system comprising: acomputer operative to receive a question from a user; computerizedanswer extraction functionality operative to employ a computer networkto access a set of documents relevant to said question by employingdocument search terms derived by said computer from said question, toextract at least one answer to said question and to provide said atleast one answer to said user, said computerized answer extractionfunctionality being operative to identify a multiplicity of potentialanswers and to evaluate each of said multiplicity of potential answersaccording to at least one of the following criteria: proximity ofquestion keywords in the potential answer; proximity of classificationwords and nouns in the potential answer; and word count of at least partof the potential answer.
 171. An answer extraction system according toclaim 170 and wherein said computerized answer extraction functionalityis operative to evaluate each of said multiplicity of potential answersaccording to at least two of the following criteria: proximity ofquestion keywords in the potential answer; proximity of classificationwords and nouns in the potential answer; and word count of at least partof the potential answer.
 172. An answer extraction system according toclaim 171 and wherein said computerized answer extraction functionalityis operative to evaluate each of said multiplicity of potential answersaccording to all of the following criteria: proximity of questionkeywords in the potential answer; proximity of classification words andnouns in the potential answer; and word count of at least part of thepotential answer.
 173. An answer extraction system according to claim172 and wherein said computerized answer extraction functionality isoperative to evaluate each of said multiplicity of potential answersaccording to a combination of the following criteria: proximity ofquestion keywords in the potential answer; proximity of classificationwords and nouns in the potential answer; and word count of at least partof the potential answer.
 174. An answer extraction system according toclaim 170 and wherein said computerized answer extraction functionalityis also operative to select a sub group of said multiplicity ofpotential answers based on an evaluation of said multiplicity ofpotential answers in accordance with said criteria.
 175. An answerextraction system according to claim 174 and wherein said evaluationcomprises scoring said multiplicity of potential answers in accordancewith said criteria.
 176. An answer extraction system according to claim175 and also comprising: computerized potential answer combiningfunctionality operative to form a potential answer document by combiningsaid multiplicity of potential answers; computerized theme extractionfunctionality for carrying out theme extraction on said sub group ofsaid multiplicity of potential answers, said theme extraction utilizingstatistical analysis of the frequency of occurrence of words in saidpotential answer document to identify at least one theme word in saidsub group of said multiplicity of potential answers, which theme wordmay or may not be a question keyword; and computerized potential answerdiscarding functionality operative to discard potential answersbelonging to said sub group of said multiplicity of potential answerswhich do not include at least one of said at least one theme word. 177.An answer extraction system according to claim 176 and wherein saidcomputerized theme extraction functionality comprises: computerized wordstemming functionality, operative, for each word in said potentialanswers document, to stem said word to a corresponding root word; a wordoccurrence frequency score generator for generating a word occurrencefrequency score for each different root word corresponding to a word insaid potential answers document; computerized document word occurrencefrequency indicating score calculating functionality operative to usesaid word occurrence frequency scores to calculate a document wordoccurrence frequency indicating score for said potential answersdocument; and computerized word selecting functionality operative toselect a subset of words in said potential answers document including atleast one word having a word occurrence frequency score which is greaterthan or equal to said document word occurrence frequency indicatingscore.
 178. An answer extraction system according to claim 177 andwherein said computerized document word occurrence frequency indicatingscore calculating functionality is operative to calculate said documentword occurrence frequency indicating score by calculating at least oneof an average of said word occurrence frequency scores and a median ofsaid word occurrence frequency scores.
 179. An answer extraction systemaccording to claim 176 and wherein said computerized theme extractionfunctionality is operative to select, as said at least one theme word,at least one word having a word occurrence frequency score which isgreater than or equal to twice said document word occurrence frequencyindicating score.
 180. An answer extraction system according to of claim176 and also comprising: computerized subset word occurrence frequencyindicating score calculating functionality, operative to calculate asubset word occurrence frequency indicating score; and computerizedtheme word selection functionality operative to select, as said at leastone theme word, at least one of said subset of words having a wordoccurrence frequency score which is greater than or equal to said subsetword occurrence frequency indicating score.
 181. An answer extractionsystem according to claim 180 and wherein said computerized subset wordoccurrence frequency indicating score calculating functionality isoperative to calculate said subset word occurrence frequency indicatingscore by calculating at least one of an average of said word occurrencefrequency scores of words in said subset of words and a median of saidword occurrence frequency scores of words in said subset of words. 182.An answer extraction system according to claim 170 and wherein saidcomputerized answer extraction functionality provides said at least oneanswer to said user in an order governed at least in part by at leastone of: a word count of each one of said at least one answer; and ascore, resulting from application to each one of said at least oneanswer of at least one of the following criteria: proximity of questionkeywords in said at least one answer; proximity of classification wordsand nouns in said at least one answer; and word count of at least partof said at least one answer.
 183. An answer extraction system accordingto claim 170 and wherein said computer is operative to receive saidquestion from at least one of: a keyboard; a voice responsive inputdevice; a screen scraping functionality; an email functionality; an SMSfunctionality; and an instant messaging functionality.
 184. An answerextraction system according to claim 170 and also comprising:computerized question normalizing functionality for normalizing saidquestion; and computerized question classification functionality forclassifying said question.
 185. An answer extraction system according toclaim 184 and wherein said computerized question normalizingfunctionality is operative to normalize said question based at least inpart on at least one of a plurality of question normalization rules.186. An answer extraction system according to claim 170 and wherein saidcomputerized answer extraction functionality is operative to employ saidcomputer to derive said document search terms, including at least oneadditional search term not present in the question, which said at leastone additional search term was acquired prior to receipt of saidquestion from said user.
 187. An answer extraction system according toclaim 170 and wherein said computerized answer extraction functionalityis operative to employ said computer to derive said document searchterms, including at least one additional search term not present in thequestion by replacing at least one word in said question by at least oneselected synonym thereof.
 188. An answer extraction system according toclaim 187 and wherein said computerized answer extraction functionalityincludes computerized synonym retrieving functionality operative toidentify said at least one selected synonym at least partially byreference to at least one word in said question other than said at leastone word which is replaced by said at least one selected synonym. 189.An answer extraction system according to claim 188 and wherein saidcomputerized synonym retrieving functionality includes a corpus and saidcomputerized synonym retrieving functionality is operative to searchsaid corpus for occurrences of at least one of a plurality of synonymsfor which there exists a phrase relevant to said question and todesignate at least one of said plurality of synonyms as a selectedsynonym in accordance with a number of occurrences in said corpus of aphrase, including said at least one of said plurality of synonymsrelevant to said question.
 190. An answer extraction system according toclaim 170 and wherein said computerized answer extraction functionalitycomprises computerized document analysis functionality operative toidentify capitalized phrases which appear in at least one of said set ofdocuments, to identify designated capitalized words belonging to saidcapitalized phrases and to add to said at least one of said set ofdocuments, adjacent each occurrence of a designated capitalized wordthat does not appear in a capitalized phrase, the designated capitalizedword that does appear alongside thereof elsewhere in said at least oneof said set of documents in a capitalized phrase, thereby providing anenhanced document, and to carry out analysis of said enhanced documentin order to identify at least one portion thereof as a potential answer.191. An answer extraction system according to claim 190 and wherein saidcomputerized answer extraction functionality also comprises an expectedanswer generator operative to generate an expected answer to saidquestion, said expected answer including question keywords, and whereinsaid computerized document analysis functionality comprises:computerized theme extraction functionality for carrying out themeextraction on said enhanced document, said theme extraction utilizingstatistical analysis of the frequency of occurrence of words to identifyat least one theme word of said enhanced document, which theme word mayor may not be a question keyword; a sentence extractor, operative toextract sentences from said enhanced document; a potential answerselector, operative to select at least one of said sentences as apotential answer if it fulfills at least one of the following criteria:a sentence including at least a predetermined plurality of questionkeywords; and a sentence including at least one question keyword and atleast one theme word; and a potential answer identifier, operative tocalculate a score for each of said at least one of said sentencesselected as a potential answer and to identify at least one of saidsentences selected as a potential answer based at least partially onresults of said score.
 192. An answer extraction system according toclaim 170 and also comprising a document output device for providing arepresentation of at least one of said set of documents to said user.193. An answer extraction system according to claim 192 and wherein saiddocument output device comprises a display for presenting at least onelink to said at least one of said set of documents.
 194. An answerextraction system according to claim 170 and also comprising an answeroutput device for providing said at least one answer to said user. 195.An answer extraction system according to claim 194 and wherein saidanswer output device comprises a display for presenting said at leastone answer in an editable report precursor format.
 196. An answerextraction method according to claim 170 and wherein said question isnot phrased in question format.
 197. A document searching methodcomprising: employing a computer to receive a query including at leastone search term from a user; and employing computerized synonymretrieving functionality operative in response to queries to generatedocument search terms including at least one additional search term notpresent in said query, said computerized synonym retrievingfunctionality being operative to generate said at least one additionalsearch term by replacing at least one word in said query by at least oneselected synonym thereof; and operating computerized search enginefunctionality to access a set of documents in response to said query,based on at least one of said at least one search term supplied by auser and said at least one additional search term provided by saidcomputerized synonym retrieving functionality, said computerized synonymretrieving functionality being operative to identify said at least oneselected synonym at least partially by reference to at least one word insaid query other than said at least one word.
 198. A document searchingmethod according to claim 197 and wherein said computerized synonymretrieving functionality is operative to identify said at least oneselected synonym by: identifying a plurality of synonyms; and selectingat least one of said plurality of synonyms for which there exists aphrase relevant to said query in a corpus.
 199. A document searchingmethod according to claim 198 and wherein said computerized synonymretrieving functionality is operative to identify said selected synonymby: searching said corpus for occurrences of said at least one of saidplurality of synonyms for which there exists a phrase relevant to saidquery; and designating at least one of said plurality of synonyms as aselected synonym in accordance with the number of occurrences in saidcorpus of a phrase including said at least one of said plurality ofsynonyms which is relevant to said query.
 200. A document searchingmethod according to claim 197 and wherein said query is a question. 201.A document searching method according to claim 197 and wherein saidquery is not a question.
 202. A document searching method according toclaim 197 and wherein said at least one word in said query which isreplaced by said at least one selected synonym thereof comprises atleast one of a noun, a verb, an object of a verb and a subject of averb.
 203. A document searching system comprising: a computer operativeto receive a query including at least one search term from a user;computerized synonym retrieving functionality operative, in response toqueries, to generate document search terms, including at least oneadditional search term not present in said query and to generate said atleast one additional search term by replacing at least one word in saidquery by at least one selected synonym thereof; and computerized searchengine functionality operative to access a set of documents in responseto said query, based on at least one of said at least one search termsupplied by a user and said at least one additional search term providedby said computerized synonym retrieving functionality, said computerizedsynonym retrieving functionality being operative to identify saidselected synonym at least partially by reference to a word in said queryother than said at least one word.
 204. A document searching systemaccording to claim 203 and wherein said computerized synonym retrievingfunctionality comprises a synonym selector operative to identify aplurality of synonyms and to select at least one of said plurality ofsynonyms for which there exists a phrase relevant to said query in acorpus.
 205. A document searching system according to claim 204 andwherein said synonym selector is operative to identify said selectedsynonym by: searching said corpus for occurrences of said at least oneof said plurality of synonyms for which there exists a phrase relevantto said query; and designating at least one of said plurality ofsynonyms as a selected synonym in accordance with the number ofoccurrences in said corpus of a phrase including said at least one ofsaid plurality of synonyms which is relevant to said query.
 206. Adocument searching system according to claim 203 and wherein said queryis a question.
 207. A document searching system according to claim 203and wherein said query is not a question.
 208. A document searchingsystem according to claim 203 and wherein said at least one word in saidquery which is replaced by said at least one selected synonym thereofcomprises at least one of a noun, a verb, an object of a verb and asubject of a verb.
 209. A computerized synonym generating methodcomprising: receiving a stream of words; employing a computer forgenerating a list of synonyms for at least one word in said stream ofwords; employing a computer for searching a corpus forsynonym-containing phrases including at least one synonym in said listof synonyms together with at least part of said stream of words;employing a computer for evaluating the frequency of occurrence of eachof said synonym-containing phrases; and proposing at least one selectedsynonym which forms part of a synonym-containing phrase having arelatively high frequency of occurrence in said corpus.
 210. Acomputerized synonym generating method according to claim 209 and alsocomprising: employing a computer for searching said corpus for receivedphrases including said at least one word together with said at leastpart of said stream of words; employing a computer for comparing thefrequency of occurrence of said received phrases in said corpus with thefrequency of occurrence of said synonym-containing phrases; andproposing at least one selected synonym which forms part of asynonym-containing phrase only if the frequency of occurrence of saidsynonym-containing phrase exceeds the frequency of occurrence of saidreceived phrase.
 211. A computerized synonym generating method accordingto claim 209 and wherein said at least one word comprises at least oneof a noun, a verb, an object of a verb and a subject of a verb.
 212. Acomputerized synonym generating system comprising: a computer operativeto generate a list of synonyms for at least one word in a stream ofwords received from a user; computerized searching functionalityoperative to search a corpus for synonym-containing phrases including atleast one synonym in said list of synonyms together with at least partof said stream of words; computerized frequency evaluation functionalityoperative to evaluate the frequency of occurrence of each of saidsynonym-containing phrases; and computerized synonym providingfunctionality operative to propose at least one selected synonym whichforms part of a synonym-containing phrase having a relatively highfrequency of occurrence in said corpus.
 213. A computerized synonymgenerating system according to claim 212 and also comprising:computerized received phrases searching functionality operative tosearch said corpus for received phrases including said at least one wordtogether with said at least part of said stream of words; andcomputerized occurrence frequency comparing functionality operative tocompare the frequency of occurrence of said received phrases in saidcorpus with the frequency of occurrence of said synonym-containingphrases, said computerized synonym providing functionality beingoperative to propose at least one selected synonym which forms part of asynonym-containing phrase only if the frequency of occurrence of saidsynonym-containing phrase exceeds the frequency of occurrence of saidreceived phrase.
 214. A computerized synonym generating system accordingto of claim 212 and wherein said at least one word comprises at leastone of a noun, a verb, an object of a verb and a subject of a verb. 215.A computerized question generation method comprising: identifying atleast one theme word in a document; searching for previously askedquestions containing said at least one theme word or having previouslygenerated answers containing said at least one theme word; andpresenting said previously asked questions.
 216. A computerized questiongeneration method according to claim 215 and also comprising, prior tosaid identifying, employing a computer to obtain said document from auser, and wherein said presenting comprises presenting said previouslyasked questions on said computer to said user.
 217. A computerizedquestion generation method according to claim 215 and wherein saididentifying comprises carrying out statistical analysis of the frequencyof occurrence of words in said document.
 218. A computerized questiongeneration method according to claim 217 and wherein said carrying outstatistical analysis comprises: for each word in said document, stemmingsaid word to a corresponding root word; generating a word occurrencefrequency score for each different root word corresponding to a word insaid document; using said word occurrence frequency scores to calculatea document word occurrence frequency indicating score for said document;and selecting a subset of words in said document including at least oneword having a word occurrence frequency score which is greater than orequal to at least said document word occurrence frequency indicatingscore.
 219. A computerized question generation method according to claim218 and wherein said document word occurrence frequency indicating scorecomprises at least one of an average of said word occurrence frequencyscores and a median of said word occurrence frequency scores.
 220. Acomputerized question generation method according to claim 215 andwherein said identifying at least one theme word comprises selecting, assaid at least one theme word, at least one word having a word occurrencefrequency score which is greater than or equal to twice said documentword occurrence frequency indicating score.
 221. A computerized questiongeneration method according to claim 215 and wherein said carrying outstatistical analysis also comprises, following said selecting a subsetof words in said document: calculating a subset word occurrencefrequency indicating score; and selecting, as said at least one themeword, at least one of said subset of words having a word occurrencefrequency score which is greater than or equal to said subset wordoccurrence frequency indicating score.
 222. A computerized questiongeneration method according to claim 221 and wherein said subset wordoccurrence frequency indicating score comprises at least one of anaverage of said word occurrence frequency scores of words in said subsetof words and a median of said word occurrence frequency scores of wordsin said subset of words.
 223. A computerized question generation systemcomprising: computerized theme word identifying functionality foridentifying at least one theme word in a document; computerized previousanswer searching functionality operative to search for previously askedquestions containing said at least one theme word or having previouslygenerated answers containing said at least one theme word; and an outputdevice for providing said previously asked questions.
 224. Acomputerized question generation system according to claim 223 andwherein said computerized theme word identifying functionality isoperative to carry out statistical analysis of the frequency ofoccurrence of words in said document.
 225. A computerized questiongeneration system according to claim 224 and wherein said computerizedtheme word identifying functionality comprises: computerized wordstemming functionality, operative, for each word in said document, tostem said word to a corresponding root word; a word occurrence frequencyscore generator for generating a word occurrence frequency score foreach different root word corresponding to a word in said document;computerized document word occurrence frequency indicating scorecalculating functionality operative to use said word occurrencefrequency scores to calculate a document word occurrence frequencyindicating score for said document; and computerized word selectingfunctionality operative to select a subset of words in said documentincluding at least one word having a word occurrence frequency scorewhich is greater than or equal to said document word occurrencefrequency indicating score.
 226. A computerized question generationsystem according to claim 225 and wherein said computerized documentword occurrence frequency indicating score calculating functionality isoperative to calculate said document word occurrence frequencyindicating score by calculating at least one of an average of said wordoccurrence frequency scores and a median of said word occurrencefrequency scores.
 227. A computerized question generation systemaccording to claim 225 and wherein said computerized theme wordidentifying functionality is operative to select, as said at least onetheme word, at least one word having a word occurrence frequency scorewhich is greater than or equal to twice said document word occurrencefrequency indicating score.
 228. A computerized question generationsystem according to claim 225 and wherein said computerized theme wordidentifying functionality also comprises: computerized subset wordoccurrence frequency indicating score calculating functionality,operative to calculate a subset word occurrence frequency indicatingscore; and computerized theme word selection functionality operative toselect, as said at least one theme word, at least one of said subset ofwords having a word occurrence frequency score which is greater than orequal to said subset word occurrence frequency indicating score.
 229. Acomputerized question generation system according to claim 228 andwherein said computerized subset word occurrence frequency indicatingscore calculating functionality is operative to calculate said subsetword occurrence frequency indicating score by calculating at least oneof an average of said word occurrence frequency scores of words in saidsubset of words and a median of said word occurrence frequency scores ofwords in said subset of words.
 230. A computerized editable reportprecursor generating method comprising: inputting at least one questioninto a computer; employing said computer to obtain at least one answerto said at least one question; storing said at least one answer to saidat least one question; presenting said at least one question to said atleast one answer in an editable form on said computer as an editablereport precursor; archiving a multiplicity of said editable reportprecursors; and following said archiving, employing said multiplicity ofeditable report precursors to enhance said employing said computer. 231.A computerized editable report precursor generating method according toclaim 230 and wherein said archiving includes archiving edited versionsof said multiplicity of editable report precursors and wherein saidedited versions are also employed to enhance said employing saidcomputer.
 232. A computerized editable report precursor generatingmethod according to claim 230 and wherein said inputting comprisesinputting said at least one question to said computer by at least oneof: typing said question; using a voice responsive input device; using ascreen scraping functionality; using an email functionality; using anSMS functionality; and using an instant messaging functionality.
 233. Acomputerized editable report precursor generating method according toclaim 230 and wherein said employing said computer comprises: employingcomputerized answer retrieving functionality to generate document searchterms including at least one additional search term not present in saidquestion, which said additional search term was acquired, prior toreceipt by said computer of said question from said user, by saidcomputerized answer retrieving functionality in response to said atleast one question; and operating computerized search enginefunctionality to access a set of documents in response to said question,based not only on at least one search term supplied by a user but alsoon said at least one additional search term provided by said at leastone computerized answer retrieving functionality.
 234. A computerizededitable report precursor generating method comprising: inputting atleast one desired report subject identifier into a computer; employingsaid computer to generate at least one question related to a desiredsubject identified by said at least one desired report subjectidentifier; employing said computer to obtain at least one answer tosaid at least one question; and presenting said at least one question tosaid at least one answer in an editable form on said computer, therebyproviding an editable report precursor.
 235. A computerized editablereport precursor generating method according to claim 234 and alsocomprising: archiving a multiplicity of said editable report precursors;and following said archiving, employing said multiplicity of editablereport precursors to enhance at least one of said employing saidcomputer to generate at least one question and said employing saidcomputer to obtain at least one answer to said at least one question.236. A computerized editable report precursor generating methodaccording to claim 234 and wherein said archiving includes archivingedited versions of said multiplicity of editable report precursors andwherein said edited versions are also employed to enhance at least oneof said employing said computer to generate at least one question andsaid employing said computer to obtain at least one answer to said atleast one question.
 237. A computerized editable report precursorgenerating method according to claim 234 and wherein said inputtingcomprises inputting said at least desired report subject identifier tosaid computer by at least one of: typing said desired report subjectidentifier; using a voice responsive input device; using a screenscraping functionality; using an email functionality; using an SMSfunctionality; and using an instant messaging functionality.
 238. Acomputerized editable report precursor generating method according toclaim 234 and wherein said employing said computer to generate said atleast one question comprises employing said desirable report subjectidentifier to search for previously asked questions containing at leastpart of said desirable report subject identifier or having previouslygenerated answers containing at least part of said desirable reportsubject identifier.
 239. A computerized editable report precursorgenerating method according to claim 234 and wherein said employing saidcomputer comprises: employing computerized answer retrievingfunctionality to generate document search terms including at least oneadditional search term not present in said question, which saidadditional search term was acquired, prior to receipt by said computerof said desired report subject identifier from said user, by saidcomputerized answer retrieving functionality in response to at least onequery; operating computerized search engine functionality to access aset of documents in response to said question, based not only on saiddesired report subject identifier but also on said at least oneadditional search term provided by said at least one computerized answerretrieving functionality.